Direct posterior confidence for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Evans, Nicholas; Troncy, Raphaël
SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy

Spoken term detection (STD) is a fundamental task in spoken
















information retrieval. Compared to conventional speech
















transcription and keyword spotting, STD is an open-vocabulary
















task and is necessarily required to address out-of-vocabulary
















(OOV) terms. Approaches based on subword units, e.g.
















phonemes, are widely used to solve the OOV issue; however,
















performance on OOV terms is still signi cantly inferior to
















that for in-vocabulary (INV) terms.
















The performance degradation on OOV terms can be attributed
















to a multitude of factors. A particular factor we address
















in this paper is that the acoustic and language models
















used for speech transcribing are highly vulnerable to OOV
















terms, which leads to unreliable con dence measures and
















error-prone detections.
















A direct posterior con dence measure that is derived from
















discriminative models has been proposed for STD. In this
















paper, we utilize this technique to tackle the weakness of
















OOV terms in con dence estimation. Neither acoustic models
















nor language models being included in the computation,
















the new con dence avoids the weak modeling problem with
















OOV terms. Our experiments, set up on multi-party meeting
















speech which is highly spontaneous and conversational,
















demonstrate that the proposed technique improves STD performance
















on OOV terms signi cantly; when combined with
















conventional lattice-based con dence, a signi cant improvement
















in performance is obtained on both INVs and OOVs.
















Furthermore, the new con dence measure technique can be
















combined together with other advanced techniques for OOV
















treatment, such as stochastic pronunciation modeling and
















term-dependent con dence discrimination, which leads to
















an integrated solution for OOV STD with greatly improved

















Digital Security
Eurecom Ref:
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy