Direct posterior confidence for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Evans, Nicholas; Troncy, Raphaël
SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy

Spoken term detection (STD) is a fundamental task in spoken

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

information retrieval. Compared to conventional speech

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

transcription and keyword spotting, STD is an open-vocabulary

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

task and is necessarily required to address out-of-vocabulary

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(OOV) terms. Approaches based on subword units, e.g.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

phonemes, are widely used to solve the OOV issue; however,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

performance on OOV terms is still signi cantly inferior to

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

that for in-vocabulary (INV) terms.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The performance degradation on OOV terms can be attributed

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

to a multitude of factors. A particular factor we address

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

in this paper is that the acoustic and language models

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

used for speech transcribing are highly vulnerable to OOV

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

terms, which leads to unreliable con dence measures and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

error-prone detections.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A direct posterior con dence measure that is derived from

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

discriminative models has been proposed for STD. In this

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

paper, we utilize this technique to tackle the weakness of

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

OOV terms in con dence estimation. Neither acoustic models

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

nor language models being included in the computation,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

the new con dence avoids the weak modeling problem with

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

OOV terms. Our experiments, set up on multi-party meeting

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

speech which is highly spontaneous and conversational,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

demonstrate that the proposed technique improves STD performance

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

on OOV terms signi cantly; when combined with

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

conventional lattice-based con dence, a signi cant improvement

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

in performance is obtained on both INVs and OOVs.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Furthermore, the new con dence measure technique can be

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

combined together with other advanced techniques for OOV

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

treatment, such as stochastic pronunciation modeling and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

term-dependent con dence discrimination, which leads to

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

an integrated solution for OOV STD with greatly improved

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

performance.


DOI
Type:
Conférence
City:
Firenze
Date:
2010-09-20
Department:
Sécurité numérique
Eurecom Ref:
3153
Copyright:
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SSCS 2010, ACM Workshop on Searching Spontaneous Conversational Speech, September 20-24, 2010, Firenze, Italy http://dx.doi.org/10.1145/1878101.1878107

PERMALINK : https://www.eurecom.fr/publication/3153