Most Automatic speech recognition systems make use of very complex HMMs to model speech trajectories. They are estimated with very large databases of annotated speech. Models are usually built to function well with any kind of speaker. Specializing these models to a particular speaker, condition, or gender, when enough data are available, is known to improve performance considerably. Unfortunately, there are rarely enough data per speaker to build a specific model anew. Therefore, the generic speaker-independent model (SI) is altered with the scarcity of data problem in mind to befit the specific speaker's speech. This is commonly referred to as speaker adaptation. Model complexity and scarcity of data are intricately related to each other. This dissertation addresses issues about speaker adaptation in the context of large-vocabulary speech recognition. Firstly, the estimation of speaker-adapted models is improved by introducing con-straints. The relationship between Euclidean distance and maximum-likelihood in the HMM framework allows us to impose linear constraints in the parametric HMM space ef-fectively. Then, feature-space transformation is extended with a new closed-form solution and a Bayesian estimation formula. Secondly, the specific applications of speaker adaptation are introduced. Self-adaptation is modeled as a cluster identification problem. Unsupervised adaptation techniques are ap-plied to discriminative adaptation. The interaction with noise adaptation is studied. Lastly, I have developed large vocabulary continuous speech recognition systems dur-ing this thesis. This comprises a miscellany of components, including a scalable acoustic training engine, language model training, self-adaptation, feature parameters normaliza-tions, and a native triphonic trigram Viterbi recognizer. The most complex part is the de-coder, which we will describe into more details. It is based on a new fast search algorithm.
Speaker adaptation : modelling variabilities
© EPFL. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at : http://dx.doi.org/10.5075/epfl-thesis-2661
PERMALINK : https://www.eurecom.fr/publication/1080