Graduate School and Research Center in Digital Sciences

Speaker diarization : A review of recent research

Anguera, X; Bozonnet, Simon; Evans, Nicholas; Fredouille, Corinne; Friedland, O; Vinyals, O

"IEEE Transactions On Audio, Speech, and Language Processing" (TASLP), special issue on "New Frontiers in Rich Transcription", February 2012, Volume 20, N°2, ISSN: 1558-7916

Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.                

Document Doi Hal Bibtex

Title:Speaker diarization : A review of recent research
Department:Digital Security
Eurecom ref:3152
Copyright: © 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Bibtex: @article{EURECOM+3152, doi = {}, year = {2011}, month = {05}, title = {{S}peaker diarization : {A} review of recent research }, author = {{A}nguera, {X} and {B}ozonnet, {S}imon and {E}vans, {N}icholas and {F}redouille, {C}orinne and {F}riedland, {O} and {V}inyals, {O}}, journal = {"{IEEE} {T}ransactions {O}n {A}udio, {S}peech, and {L}anguage {P}rocessing" ({TASLP}), special issue on "{N}ew {F}rontiers in {R}ich {T}ranscription", {F}ebruary 2012, {V}olume 20, {N}°2, {ISSN}: 1558-7916 }, url = {} }
See also: