Detecting Novel Terms from Large Audio Archives

Dong Wang - Post Doc MM
Multimedia Communications

Date: Tue, 06/22/2010 - 10:00 - Tue, 06/22/2010 - 11:00
Location: Eurecom

Retrieving information from multimedia data is highly attractive, especially considering the large amount of audio/video content that is nowadays available, and the rich information that it contains. Spoken term detection (STD) is a fundamental task in multimedia information retrieval and aims to search vast, heterogeneous audio archives for occurrences of spoken terms. A major challenge faced by STD systems is the significant performance reduction when detecting out-of-vocabulary (OOV) terms. Some terms are OOV because of the limited system lexicon, whereas others arise from the rapid development of human languages and the new words/terms which emerge from the younger generation every day. In this talk I will present two techniques to address the OOV issue of STD: a stochastic pronunciation model and a discriminative decision strategy. Put in a decision theory framework, this work leads to a concrete belief that, if we say 'everything is statistical', we have to say 'everything is biased'.