Graduate School and Research Center in Digital Sciences

An integrated top-down/bottom-up approach to speaker diarization

Bozonnet, Simon; Evans, Nicholas; Fredouille, C; Wang, Dong; Troncy, Raphaël

INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan

Most speaker diarization systems fit into one of two categories: bottom-up or top-down. Bottom-up systems are the most popular but can sometimes suffer from instability from merging and stopping criteria difficulties. Top-down systems deliver competitive results but are particularly prone to poor model initialization which often leads to large variations in performance. This paper presents a new integrated bottom-up/topdown approach to speaker diarization which aims to harness the strengths of each system and thus to improve performance and stability. In contrast to previous work, here the two systems are fused at the heart of the segmentation and clustering stage. Experimental results show improvements in speaker diarization performance for both meeting and TV-show domain data indicating increased intra and inter-domain stability. On the TVshow data in particular, an average relative improvement of 32% DER is obtained.

Document Hal Bibtex

Title:An integrated top-down/bottom-up approach to speaker diarization
Keywords:Speaker Diarization, speaker segmentation, speaker clustering, system combination, SDM
Type:Conference
Language:English
City:Makuhari
Country:JAPAN
Date:
Department:Digital Security
Eurecom ref:3154
Copyright: © ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan and is available at :
Bibtex: @inproceedings{EURECOM+3154, year = {2010}, title = {{A}n integrated top-down/bottom-up approach to speaker diarization}, author = {{B}ozonnet, {S}imon and {E}vans, {N}icholas and {F}redouille, {C} and {W}ang, {D}ong and {T}roncy, {R}apha{\"e}l}, booktitle = {{INTERSPEECH} 2010, 11th {A}nnual {C}onference of the {I}nternational {S}peech {C}ommunication {A}ssociation, {S}eptember 26-30, 2010, {M}akuhari, {J}apan}, address = {{M}akuhari, {JAPAN}}, month = {09}, url = {http://www.eurecom.fr/publication/3154} }
See also: