Short-duration speaker modelling with phone adaptive training

Soldi, Giovanni; Bozonnet, Simon; Alegre, Federico; Beaugeant, Christophe; Evans, Nicholas

ODYSSEY 2014, Speaker and Language Recognition Workshop, June 16-19, 2014, Joensuu, Finland

This paper presents a new approach to feature-level phone normalisation which aims to improve speaker modelling in the case of short-duration training data. The new approach is referred to as phone adaptive training (PAT). Based on constrained maximum likelihood linear regression (cMLLR) and previous work in speaker adaptive training (SAT), PAT learns a set of transforms which project features into a new phonenormalised but speaker-discriminative space. Originally investigated in the context of speaker diarization, this paper presents new work to assess and optimise PAT at the level of speaker modelling and in the context of automatic speaker

verification (ASV). Experiments show that PAT improves the performance of a state-of-the-art iVector ASV system by 50% relative to the baseline.

Detail

Document

BIBTEX

Type:

Conference

City:

Joensuu

Date:

2014-06-16

Department:

Digital Security

Eurecom Ref:

4312

© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2014, Speaker and Language Recognition Workshop, June 16-19, 2014, Joensuu, Finland and is available at :