Ecole d'ingénieur et centre de recherche en Sciences du numérique

Low-latency speaker spotting with online diarization and detection

Patino, Jose; Yin, Ruiqing; Delgado, Héctor; Bredin, Hervé; Komaty, Alain; Wisniewski, Guillaume; Barras, Claude; Evans, Nicholas; Marcel, Sébastien

ODYSSEY 2018, The Speaker and Language Recognition Workshop, June 26-29, 2018, Les Sables d'Olonne, France

This paper introduces a new task termed low-latency speaker spotting (LLSS). Related to security and intelligence applications, the task involves the detection, as soon as possible, of known speakers within multi-speaker audio streams. The paper describes differences to the established fields of speaker diarization and automatic speaker verification and proposes a new protocol and metrics to support exploration of LLSS. These can be used together with an existing, publicly available database to assess the performance of LLSS solutions also proposed in the paper. They combine online diarization and speaker detection systems. Diarization systems include a naive, over-segmentation approach and fully-fledged online diarization using segmental i-vectors. Speaker detection is performed using Gaussian mixture models, i-vectors or neural speaker embeddings. Metrics reflect different approaches to characterise latency in addition to detection performance. The relative performance of each solution is dependent on latency. When higher latency is admissible, i-vector solutions perform well; embeddings excel when latency must be kept to a minimum. With a need to improve the reliability of online diarization and detection, the proposed LLSS framework provides a vehicle to fuel future research in both areas. In this respect, we embrace a reproducible research policy; results can be readily reproduced using publicly available resources and open source codes. 

Document Bibtex

Titre:Low-latency speaker spotting with online diarization and detection
Type:Conférence
Langue:English
Ville:Les Sables d'Olonne
Pays:FRANCE
Date:
Département:Sécurité numérique
Eurecom ref:5522
Copyright: © ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2018, The Speaker and Language Recognition Workshop, June 26-29, 2018, Les Sables d'Olonne, France and is available at :
Bibtex: @inproceedings{EURECOM+5522, year = {2018}, title = {{L}ow-latency speaker spotting with online diarization and detection}, author = {{P}atino, {J}ose and {Y}in, {R}uiqing and {D}elgado, {H}{\'e}ctor and {B}redin, {H}erv{\'e} and {K}omaty, {A}lain and {W}isniewski, {G}uillaume and {B}arras, {C}laude and {E}vans, {N}icholas and {M}arcel, {S}{\'e}bastien}, booktitle = {{ODYSSEY} 2018, {T}he {S}peaker and {L}anguage {R}ecognition {W}orkshop, {J}une 26-29, 2018, {L}es {S}ables d'{O}lonne, {F}rance }, address = {{L}es {S}ables d'{O}lonne, {FRANCE}}, month = {06}, url = {http://www.eurecom.fr/publication/5522} }
Voir aussi: