This paper further develops a previously proposed adaptation method for speech recognition called Symbolic Speaker Adaptation (SSA). The basic idea of SSA is to model a speaker's pronunciation as a blend of speech varieties (SVs) - regional dialects and foreign accents - for which the system has existing pronunciation models. The system determines during an adaptation process the relative applicability of those models, yielding a speech variety profile (SVP) for each speaker. Speaker-dependent lexica for recognition are determined from a speaker's SVP. In this paper, we discuss a series of experiments designed to analyze how the SSA method is affected by SV-balanced training, expanded phone inventories, reduced amounts of adaptation data, and speech from SVs not modeled by the system. The most dramatic improvements were obtained by using expanded ("SV-inclusive") phone inventories. SSA was also shown to be effective with a very small number of adaptation sentences. And, SSA's SV blending scheme yields higher accuracy than using a SV classification scheme for speakers of novel (unseen) SVs.
Symbolic speaker adaptation with phone inventory expansion
ICASSP 2003, 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, April 6-10, 2003, Hong Kong
© 2003 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/1209