Graduate School and Research Center in Digital Sciences

Direct posterior confidence for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Frankel, Joe; Vipperla, Ravichander; Evans, Nicholas; Troncy, Raphaël

ACM Transactions on Information Systems (TOIS), Vol 30, N°3, August 2012

Spoken term detection (STD) is a key technology for spoken information retrieval. As compared to the conventional speech transcription and keyword spotting, STD is an open-vocabulary task and has to address out-of-vocabulary (OOV) terms. Approaches based on subword units, e.g. phones, are widely used to solve the OOV issue; however, performance on OOV terms is still substantially inferior to that of in-vocabulary (INV) terms. The performance degradation on OOV terms can be attributed to a multitude of factors. One particular factor we address in this paper is the unreliable confidence estimation caused by weak acoustic and language modeling due to the absence of OOV terms in the training corpora. We propose a direct posterior confidence derived from a discriminative model, such as a multi-layer perceptron (MLP). The new confidence considers a wide-range acoustic context which is usually important for speech recognition and retrieval; moreover, it localizes on detected speech segments and therefore avoids the impact of long-span word context which is usually unreliable for OOV term detection. In this paper we first develop an extensive discussion about the modeling weakness problem associated with OOV terms, and then propose our approach to address this problem based on direct poster confidence. Our experiments carried out on spontaneous and conversational multi-party meeting speech, demonstrate that the proposed technique provides a significant improvement in STD performance as compared to the conventional lattice-based confidence, in particular for OOV terms. Furthermore, the new confidence estimation approach is fused with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and discriminative confidence normalization. This leads to an integrated solution for OOV term detection that results in a large performance improvement.

Document Doi Bibtex

Title:Direct posterior confidence for out-of-vocabulary spoken term detection
Keywords:Algorithms, Experimentation, speech recognition, spontaneous speech search, spoken term detection
Department:Digital Security
Eurecom ref:3639
Copyright: © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Information Systems (TOIS), Vol 30, N°3, August 2012
Bibtex: @article{EURECOM+3639, doi = {}, year = {2012}, month = {06}, title = {{D}irect posterior confidence for out-of-vocabulary spoken term detection}, author = {{W}ang, {D}ong and {K}ing, {S}imon and {F}rankel, {J}oe and {V}ipperla, {R}avichander and {E}vans, {N}icholas and {T}roncy, {R}apha{\"e}l }, journal = {{ACM} {T}ransactions on {I}nformation {S}ystems ({TOIS}), {V}ol 30, {N}°3, {A}ugust 2012}, url = {} }
See also: