Short-duration speaker modelling with phone adaptive training

Soldi, Giovanni; Bozonnet, Simon; Alegre, Federico; Beaugeant, Christophe; Evans, Nicholas
ODYSSEY 2014, Speaker and Language Recognition Workshop, June 16-19, 2014, Joensuu, Finland

This paper presents a new approach to feature-level phone normalisation which aims to improve speaker modelling in the case of short-duration training data. The new approach is referred to as phone adaptive training (PAT). Based on constrained maximum likelihood linear regression (cMLLR) and previous work in speaker adaptive training (SAT), PAT learns a set of transforms which project features into a new phonenormalised but speaker-discriminative space. Originally investigated in the context of speaker diarization, this paper presents new work to assess and optimise PAT at the level of speaker modelling and in the context of automatic speaker
verification (ASV). Experiments show that PAT improves the performance of a state-of-the-art iVector ASV system by 50% relative to the baseline.

Type:
Conference
City:
Joensuu
Date:
2014-06-16
Department:
Digital Security
Eurecom Ref:
4312
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2014, Speaker and Language Recognition Workshop, June 16-19, 2014, Joensuu, Finland and is available at :

PERMALINK : https://www.eurecom.fr/publication/4312