Research Activities > Speech and Audio Processing Group
A- / A+ / plug_site_print

 Research Themes

 

 

 

 Improving speaker diarisation with multiple distant microphones

Otherwise known as the “who spoke when ?” task, speaker diarisation aims to identify first, the number of speakers in an audio document and second, the (possibly overlapping) intervals during which each speaker is active. Some applications include automatic annotation, speaker tracking, speaker-based indexing and speaker adaptation.

 

Whilst conventional cepstral features extracted from a single microphone channel have proved successful, recent works have highlighted the benefits of using multiple microphones. This can involve the beamforming of the multiple channels in order to obtain a single enhanced, virtual channel or the estimation of new features based upon the delay between channels.

 

However, whilst the potential of using multiple microphones is clear, the performance coming from delay features alone is far short of that obtained with cepstral features, one reason being due to the short time windows over which the between-channel delay is measured, making for noisy features. In addition, few diarisation systems are capable of effectively utilising the combined cepstral and delay features.

 

This research aims to investigate new, robust features from multiple microphone recordings and to successfully fuse the cepstral and delay features to improve diarisation performance.