CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Evans, Nicholas; Troncy, Raphaël

INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan

Out-of-vocabulary (OOV) terms present a signiﬁcant challenge to spoken term detection (STD). This challenge, to a large extent, lies in the high degree of uncertainty in pronunciations of OOV terms. In previous work, we presented a stochastic pronunciation modeling (SPM) approach to compensate for this uncertainty. A shortcoming of our original work, however, is that the SPM was based on a joint-multigram model (JMM), which is suboptimal. In this paper, we propose to use conditional random ﬁelds (CRFs) for letter-to-sound conversion, which signiﬁcantly improves quality of the predicted pronunciations. When applied to OOV STD, we achieve considerable performance improvement with both a 1-best system and an SPM-based system.

Out-of-vocabulary (OOV) terms present a significant challenge

to spoken term detection (STD). This challenge, to a large extent,

lies in the high degree of uncertainty in pronunciations of

OOV terms. In previous work, we presented a stochastic pronunciation

modeling (SPM) approach to compensate for this

uncertainty. A shortcoming of our original work, however, is

that the SPM was based on a joint-multigram model (JMM),

which is suboptimal. In this paper, we propose to use conditional

random fields (CRFs) for letter-to-sound conversion,

which significantly improves quality of the predicted pronunciations.

When applied to OOV STD, we achieve considerable

performance improvement with both a 1-best system and

an SPM-based system.

Detail

Document

DOI

BIBTEX

Type:

Conference

City:

Makuhari

Date:

2010-09-26

Department:

Data Science

Eurecom Ref:

3156

© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan and is available at : http://dx.doi.org/10.21437/Interspeech.2010-481