Graduate School and Research Center in Digital Sciences

Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization

Vipperla, Ravichander; Bozonnet, Simon; Wang, Dong; Evans, Nicholas

CHIME 2011, 1st International Workshop on Machine Listening in Multisource Environments, Interspeech, September 1st, 2011, Florence, Italy

Convolutive non-negative matrix factorization (CNMF) is an effective  approach for supervised audio source separation. It relies on the availability of sufficient training data to learn a set of bases for each acoustic source. For automatic speech recognition (ASR) in a multi-source noise environment, the varied nature of background noise makes it a challenging task to learn the noise bases and thereby to suppress it from the speech signal using CNMF. A large amount of training data is required to reliably capture noise variation, but this generally leads to an unacceptable computational burden. Here, we address this problem by learning the noise bases using a computationally efficient, online CNMF approach. By learning the noise bases from several hours of ambient noise data and over a few seconds of local acoustic context, we show that background noise can be effectively attenuated from noisy speech. ASR accuracies on the CHiME corpus with the denoised speech show relative improvements in the range of 42.3% for -6 dB signal-to-noise ratio (SNR) to 2.5% for 9 dB SNR.

Document Bibtex

Title:Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization
Keywords:Convolutive non-negative matrix factorization, online CNMF, speech separation, automatic speech recognition
Type:Conference
Language:English
City:Florence
Country:ITALY
Date:
Department:Digital Security
Eurecom ref:3414
Copyright: © ISCA. Personal use of this material is permitted. The definitive version of this paper was published in CHIME 2011, 1st International Workshop on Machine Listening in Multisource Environments, Interspeech, September 1st, 2011, Florence, Italy and is available at :
Bibtex: @inproceedings{EURECOM+3414, year = {2011}, title = {{R}obust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization}, author = {{V}ipperla, {R}avichander and {B}ozonnet, {S}imon and {W}ang, {D}ong and {E}vans, {N}icholas}, booktitle = {{CHIME} 2011, 1st {I}nternational {W}orkshop on {M}achine {L}istening in {M}ultisource {E}nvironments, {I}nterspeech, {S}eptember 1st, 2011, {F}lorence, {I}taly}, address = {{F}lorence, {ITALY}}, month = {09}, url = {http://www.eurecom.fr/publication/3414} }
See also: