Ecole d'ingénieur et centre de recherche en Sciences du numérique

CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection

Wang, Dong; King, Simon; Evans, Nicholas; Troncy, Raphaël

INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan

  Out-of-vocabulary (OOV) terms present a significant challenge to spoken term detection (STD). This challenge, to a large extent, lies in the high degree of uncertainty in pronunciations of OOV terms. In previous work, we presented a stochastic pronunciation modeling (SPM) approach to compensate for this uncertainty. A shortcoming of our original work, however, is that the SPM was based on a joint-multigram model (JMM), which is suboptimal. In this paper, we propose to use conditional random fields (CRFs) for letter-to-sound conversion, which significantly improves quality of the predicted pronunciations. When applied to OOV STD, we achieve considerable performance improvement with both a 1-best system and an SPM-based system.                                                                                                                                                                                               Out-of-vocabulary (OOV) terms present a significant challenge                                                               to spoken term detection (STD). This challenge, to a large extent,                                                               lies in the high degree of uncertainty in pronunciations of                                                               OOV terms. In previous work, we presented a stochastic pronunciation                                                               modeling (SPM) approach to compensate for this                                                               uncertainty. A shortcoming of our original work, however, is                                                               that the SPM was based on a joint-multigram model (JMM),                                                               which is suboptimal. In this paper, we propose to use conditional                                                               random fields (CRFs) for letter-to-sound conversion,                                                               which significantly improves quality of the predicted pronunciations.                                                               When applied to OOV STD, we achieve considerable                                                               performance improvement with both a 1-best system and                                                               an SPM-based system.

Document Bibtex

Titre:CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection
Mots Clés:speech recognition, spoken term detection, conditional random field, joint multigram model
Type:Conférence
Langue:English
Ville:Makuhari
Pays:JAPON
Date:
Département:Data Science
Eurecom ref:3156
Copyright: © ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Japan and is available at :
Bibtex: @inproceedings{EURECOM+3156, year = {2010}, title = {{CRF}-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection}, author = {{W}ang, {D}ong and {K}ing, {S}imon and {E}vans, {N}icholas and {T}roncy, {R}apha{\"e}l}, booktitle = {{INTERSPEECH} 2010, 11th {A}nnual {C}onference of the {I}nternational {S}peech {C}ommunication {A}ssociation, {S}eptember 26-30, 2010, {M}akuhari, {J}apan}, address = {{M}akuhari, {JAPON}}, month = {09}, url = {http://www.eurecom.fr/publication/3156} }
Voir aussi: