Direct posterior confidence for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Evans, Nicholas; Troncy, Raphaël

SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy

Spoken term detection (STD) is a fundamental task in spoken

information retrieval. Compared to conventional speech

transcription and keyword spotting, STD is an open-vocabulary

task and is necessarily required to address out-of-vocabulary

(OOV) terms. Approaches based on subword units, e.g.

phonemes, are widely used to solve the OOV issue; however,

performance on OOV terms is still signi cantly inferior to

that for in-vocabulary (INV) terms.

The performance degradation on OOV terms can be attributed

to a multitude of factors. A particular factor we address

in this paper is that the acoustic and language models

used for speech transcribing are highly vulnerable to OOV

terms, which leads to unreliable con dence measures and

error-prone detections.

A direct posterior con dence measure that is derived from

discriminative models has been proposed for STD. In this

paper, we utilize this technique to tackle the weakness of

OOV terms in con dence estimation. Neither acoustic models

nor language models being included in the computation,

the new con dence avoids the weak modeling problem with

OOV terms. Our experiments, set up on multi-party meeting

speech which is highly spontaneous and conversational,

demonstrate that the proposed technique improves STD performance

on OOV terms signi cantly; when combined with

conventional lattice-based con dence, a signi cant improvement

in performance is obtained on both INVs and OOVs.

Furthermore, the new con dence measure technique can be

combined together with other advanced techniques for OOV

treatment, such as stochastic pronunciation modeling and

term-dependent con dence discrimination, which leads to

an integrated solution for OOV STD with greatly improved

performance.

Detail

Document

DOI

BIBTEX

Type:

Conference

City:

Firenze

Date:

2010-09-20

Department:

Digital Security

Eurecom Ref:

3153

© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy http://dx.doi.org/10.1145/1878101.1878107