Simon Bozonnet, Nicholas Evans, C Fredouille, Dong Wang and Raphaël Troncy
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan
Abstract: Most speaker diarization systems fit into one of two categories: bottom-up or top-down. Bottom-up systems are the most popular but can sometimes suffer from instability from merging and stopping criteria difficulties. Top-down systems deliver competitive results but are particularly prone to poor model initialization which often leads to large variations in performance. This paper presents a new integrated bottom-up/topdown approach to speaker diarization which aims to harness the strengths of each system and thus to improve performance and stability. In contrast to previous work, here the two systems are fused at the heart of the segmentation and clustering stage. Experimental results show improvements in speaker diarization performance for both meeting and TV-show domain data indicating increased intra and inter-domain stability. On the TVshow data in particular, an average relative improvement of 32% DER is obtained.