An integrated top-down/bottom-up approach to speaker diarization

Bozonnet, Simon; Evans, Nicholas; Fredouille, C; Wang, Dong; Troncy, Raphaël
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan

Most speaker diarization systems fit into one of two categories: bottom-up or top-down. Bottom-up systems are the most popular but can sometimes suffer from instability from

merging and stopping criteria difficulties. Top-down systems deliver competitive results but are particularly prone to poor model initialization which often leads to large variations in performance. This paper presents a new integrated bottom-up/topdown approach to speaker diarization which aims to harness the strengths of each system and thus to improve performance and stability. In contrast to previous work, here the two systems are fused at the heart of the segmentation and clustering stage. Experimental results show improvements in speaker diarization performance for both meeting and TV-show domain data indicating

increased intra and inter-domain stability. On the TVshow data in particular, an average relative improvement of 32% DER is obtained.


DOI
HAL
Type:
Conference
City:
Makuhari
Date:
2010-09-26
Department:
Digital Security
Eurecom Ref:
3154
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan and is available at : http://dx.doi.org/10.21437/Interspeech.2010-702

PERMALINK : https://www.eurecom.fr/publication/3154