Giovanni Soldi, Simon Bozonnet, Federico Alegre, Christophe Beaugeant and Nicholas Evans
ODYSSEY 2014, Speaker and Language Recognition Workshop, June 16-19, 2014, Joensuu, Finland
Abstract: This paper presents a new approach to feature-level phone normalisation which aims to improve speaker modelling in the case of short-duration training data. The new approach is referred to as phone adaptive training (PAT). Based on constrained maximum likelihood linear regression (cMLLR) and previous work in speaker adaptive training (SAT), PAT learns a set of transforms which project features into a new phonenormalised but speaker-discriminative space. Originally investigated in the context of speaker diarization, this paper presents new work to assess and optimise PAT at the level of speaker modelling and in the context of automatic speaker verification (ASV). Experiments show that PAT improves the performance of a state-of-the-art iVector ASV system by 50% relative to the baseline.