LINGUISTIC INFLUENCES ON BOTTOM-UP AND TOP-DOWN CLUSTERING FOR SPEAKER DIARIZATION

Simon Bozonnet - PhD student MM department
Multimedia Communications

Date: -
Location: Eurecom

While bottom-up approaches have emerged as the standard, default approach to clustering for speaker diarization we have always found the top-down approach gives equivalent or superior performance. Our recent work shows that significant gains in performance can be obtained when cluster purification is applied to the output of top- down systems but that it can degrade performance when applied to the output of bottom-up systems. In this presentation we demonstrate that these observations can be accounted for by factors unrelated to the speaker and that they can impact more strongly on the performance of bottom-up clustering strategies than top-down strategies. Experimental results confirm that clusters produced through top-down clustering are better normalized against phone variation than those produced through bottom-up clustering and that this accounts for the observed inconsistencies in purification performance. The work highlights the need for marginalization strategies which should en- courage convergence toward different speakers rather than toward nuisance factors such as that those related to the linguistic content.