A Multimodal approach to initialisation for top-down speaker diarization of television shows

Bozonnet, Simon; Vallet, Félicien; Evans, Nicholas; Essid, Slim; Richard, Gaël; Carrive, Jean
EUSIPCO 2010, 18th European Signal Processing Conference, August 23-27, 2010, Aalborg, Denmark 


This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize that the intraspeaker variation in visual information might be less than that in the corresponding acoustic information and therefore might be better suited to the task of speaker model initialisation. This is an acknowledged weakness of the computationally

efficient top-down approach to speaker diarization that is used here. Experimental results show that a recently proposed approach to purification and the new multimodal approach to initialisation together deliver 22% and 17% relative improvements in diarization performance over the baseline system on independent development and evaluation datasets respectively.


HAL
Type:
Conference
City:
Aalborg
Date:
2010-08-23
Department:
Digital Security
Eurecom Ref:
3120
Copyright:
© EURASIP. Personal use of this material is permitted. The definitive version of this paper was published in EUSIPCO 2010, 18th European Signal Processing Conference, August 23-27, 2010, Aalborg, Denmark 
 and is available at :

PERMALINK : https://www.eurecom.fr/publication/3120