Speech overlap detection using convolutive non-negative sparse coding

Vipperla, Ravichander; Wang, Dong; Bozonnet, Simon; Evans, Nicholas
Research Report RR-11-257

Overlapping speech is known to degrade speaker diarization performance with impacts on both speech activity detection, speaker clustering and segmentation (speaker error). While previous related work has made important advances the problem remains largely unsolved.

This paper reports early work to investigate the application of non-negative matrix factorisation (NMF) to the overlap problem. NMF aims to decompose a composite signal into its underlying contributory parts and is thus naturally suited to tasks of detecting overlap and its attribution to contributing speakers. With additional sparse constraints the algorithm is shown to be effective in identifying overlapping speech and gives a relative improvement of 11% in terms of equal error rate over a baseline approach based on conventional Gaussian mixture models. Experiments with source attribution show a relative improvement in the order of 40%.

Sécurité numérique
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research Report RR-11-257 and is available at :

PERMALINK : https://www.eurecom.fr/publication/3423