Ecole d'ingénieur et centre de recherche en Sciences du numérique

Direct posterior confidence for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Evans, Nicholas; Troncy, Raphaël

SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy

Spoken term detection (STD) is a fundamental task in spoken                               information retrieval. Compared to conventional speech                               transcription and keyword spotting, STD is an open-vocabulary                               task and is necessarily required to address out-of-vocabulary                               (OOV) terms. Approaches based on subword units, e.g.                               phonemes, are widely used to solve the OOV issue; however,                               performance on OOV terms is still signi cantly inferior to                               that for in-vocabulary (INV) terms.                               The performance degradation on OOV terms can be attributed                               to a multitude of factors. A particular factor we address                               in this paper is that the acoustic and language models                               used for speech transcribing are highly vulnerable to OOV                               terms, which leads to unreliable con dence measures and                               error-prone detections.                               A direct posterior con dence measure that is derived from                               discriminative models has been proposed for STD. In this                               paper, we utilize this technique to tackle the weakness of                               OOV terms in con dence estimation. Neither acoustic models                               nor language models being included in the computation,                               the new con dence avoids the weak modeling problem with                               OOV terms. Our experiments, set up on multi-party meeting                               speech which is highly spontaneous and conversational,                               demonstrate that the proposed technique improves STD performance                               on OOV terms signi cantly; when combined with                               conventional lattice-based con dence, a signi cant improvement                               in performance is obtained on both INVs and OOVs.                               Furthermore, the new con dence measure technique can be                               combined together with other advanced techniques for OOV                               treatment, such as stochastic pronunciation modeling and                               term-dependent con dence discrimination, which leads to                               an integrated solution for OOV STD with greatly improved                               performance.

Document Doi Bibtex

Titre:Direct posterior confidence for out-of-vocabulary spoken term detection
Mots Clés:spoken term detection, speech document search, spontaneous
Type:Conférence
Langue:English
Ville:Firenze
Pays:ITALIE
Date:
Département:Sécurité numérique
Eurecom ref:3153
Copyright: © ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy http://dx.doi.org/10.1145/1878101.1878107
Bibtex: @inproceedings{EURECOM+3153, doi = {http://dx.doi.org/10.1145/1878101.1878107 }, year = {2010}, title = {{D}irect posterior confidence for out-of-vocabulary spoken term detection}, author = {{W}ang, {D}ong and {K}ing, {S}imon and {E}vans, {N}icholas and {T}roncy, {R}apha{\"e}l}, booktitle = {{SSCS} 2010, {ACM} {W}orkshop on {S}earching {S}pontaneous {C}onversational {S}peech, {S}eptember 20-24, 2010, {F}irenze, {I}taly}, address = {{F}irenze, {ITALIE}}, month = {09}, url = {http://www.eurecom.fr/publication/3153} }
Voir aussi: