Speaker change detection can be of benefit to a number of different speech processing tasks such as speaker diarization, recognition and detection. Current solutions rely either on highly localized data or on training with large quantities of background data. While efficient, the former tend to over-segment. While more stable, the latter are less efficient and need adaptation to mis-matching data. Building on previous work in speaker recognition and diarization, this paper reports a new binary key (BK) modelling approach to speaker change detection which aims to strike a balance between efficiency and segmentation accuracy. The BK approach benefits from training using a controllable degree of contextual data, rather than relying on external background data, and is efficient in terms of computation and speaker discrimination. Experiments on a subset of the standard ETAPE database show that the new approach outperforms the current state-of-the-art methods for speaker change detection and gives an average relative improvement in segment coverage and purity of 18.71% and 4.51% respectively.
Speaker change detection using binary key modelling with contextual information
SLSP 2017, 5th International Conference on Statistical Language and Speech Processing, October 23-25, 2017, Le Mans, France
© Springer. Personal use of this material is permitted. The definitive version of this paper was published in SLSP 2017, 5th International Conference on Statistical Language and Speech Processing, October 23-25, 2017, Le Mans, France and is available at :
PERMALINK : https://www.eurecom.fr/publication/5338