This industrial CIFRE PhD thesis addresses automatic speaker verification (ASV) issues in the context of embedded applications. The first part of this thesis focuses on more traditional problems and topics. The first work investigates the minimum enrolment data requirements for a practical, text-dependent short-utterance ASV system.
Contributions in part A of the thesis consist in a statistical analysis whose objective is to isolate text-dependent factors and prove they are consistent across different sets of speakers. For very short utterances, the influence of a specific text content on the system performance can be considered a speaker-independent factor.
Part B of the thesis focuses on neural network-based solutions. While it was clear that neural networks and deep learning were becoming state-of-the-art in several machine learning domains, their use for embedded solutions was hindered by their complexity. Contributions described in the second part of the thesis comprise blue-sky, experimental research which tackles the substitution of hand-crafted, traditional speaker features in favour of operating directly upon the audio waveform and the search for optimal network architectures and weights by means of genetic algorithms. This work is the most fundamental contribution: lightweight, neuro-evolved network structures which are able to learn from the raw audio input.