System output combination for improved speaker diarization

Bozonnet, Simon; Evans, Nicholas; Anguera, X; Vinyals, O; Friedland, G; Fredouille, Corinne
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan

System combination or fusion is a popular, successful and

 

 

 

sometimes straightforward means of improving performance in

 

 

 

many fields of statistical pattern classification, including speech

 

 

 

and speaker recognition. Whilst there is significant work in

 

 

 

the literature which aims to improve speaker diarization performance

 

 

 

by combining multiple feature streams, there is little

 

 

 

work which aims to combine the outputs of multiple systems.

 

 

 

This paper reports our first attempts to combine the outputs of

 

 

 

two state-of-the-art speaker diarization systems, namely ICSI's

 

 

 

bottom-up and LIA-EURECOM's top-down systems. We show

 

 

 

that a cluster matching procedure reliably identifies corresponding

 

 

 

speaker clusters in the two system outputs and that, when

 

 

 

they are used in a new realignment and resegmentation stage,

 

 

 

the combination leads to relative improvements of 13% and 7%

 

 

 

DER on independent development and evaluation sets.


DOI
HAL
Type:
Conférence
City:
Makuhari
Date:
2010-09-26
Department:
Sécurité numérique
Eurecom Ref:
3155
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan and is available at : http://dx.doi.org/10.21437/Interspeech.2010-701

PERMALINK : https://www.eurecom.fr/publication/3155