Graduate School and Research Center in Digital Sciences

A multimodal approach to initialisation for top-down speaker diarization of television shows

Bozonnet, Simon; Vallet, Félicien; Evans, Nicholas; Essid, Slim; Richard, Gaël; Carrive, Jean

Research Report RR-10-239, May 5th, 2010

This technical report presents a new multimodal approach to speaker diarization of TV show data. We hypothesize that the inter-speaker variation in visual information might be less than that in the corresponding acoustic information and therefore might be better suited to the task of speaker model initialisation, an acknowledged weakness of the computationally effi- cient top-down approach to the task of speaker diarization that is used here. Experimental results show that a recently proposed approach to purification and the new multimodal approach to initialisation together deliver 22% and 17% relative improvements in diarization performance over the baseline system on independent development and evaluation datasets respectively.

Document Bibtex

Title:A multimodal approach to initialisation for top-down speaker diarization of television shows
Keywords:Speaker diarization, speaker clustering, speaker segmentation, content indexing, multimodal system, video, audio, fusion system, TV-shows
Type:Report
Language:English
City:
Date:
Department:Digital Security
Eurecom ref:3097
Copyright: © EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research Report RR-10-239, May 5th, 2010 and is available at :
Bibtex: @techreport{EURECOM+3097, year = {2010}, title = {{A} multimodal approach to initialisation for top-down speaker diarization of television shows}, author = {{B}ozonnet, {S}imon and {V}allet, {F}{\'e}licien and {E}vans, {N}icholas and {E}ssid, {S}lim and {R}ichard, {G}a{\"e}l and {C}arrive, {J}ean}, number = {EURECOM+3097}, month = {05}, institution = {Eurecom}, url = {http://www.eurecom.fr/publication/3097},, }
See also: