Ravichander Vipperla, Dong Wang, Simon Bozonnet and Nicholas Evans
Research Report RR-11-257
Abstract: Overlapping speech is known to degrade speaker diarization performance with impacts on both speech activity detection, speaker clustering and segmentation (speaker error). While previous related work has made important advances the problem remains largely unsolved. This paper reports early work to investigate the application of non-negative matrix factorisation (NMF) to the overlap problem. NMF aims to decompose a composite signal into its underlying contributory parts and is thus naturally suited to tasks of detecting overlap and its attribution to contributing speakers. With additional sparse constraints the algorithm is shown to be effective in identifying overlapping speech and gives a relative improvement of 11% in terms of equal error rate over a baseline approach based on conventional Gaussian mixture models. Experiments with source attribution show a relative improvement in the order of 40%.