Automatic Speech Recognition (ASR) usually works well with close-talking microphone environment rather than in far-field conditions. A major challenge in the far-field ASR systems is to handle the background noise, multi path reflections, and reverberation, that leads to decrease in the quality of the speech signal. To that effect, we propose Teager energy-based Gabor filterbank (TGFB) features that preserve the amplitude and frequency modulation of a resonant signal, and improve the time-frequency resolution. In addition, via TGFB features, we exploit noise suppression capability of Teager Energy Operator (TEO) for improving ASR performance under signal degradation conditions due to far-field speech. The ASR experiments are performed on LibriSpeech (near-field) and CHiME-3 (far-field) corpora. Marginal improvements were observed for TGFB features over MFCC features in our experiments. We observed that the system combination of TGFB and MFCC features could provide significant improvements over the standalone MFCC features. For LibriSpeech corpus, a relative improvement for Word Error Rate (WER) of close to 5% was observed. On the other hand, for CHiME-3 corpus, the average relative improvement of 7.20 % was obtained over the baseline features using system level combination.
Teager energy subband filtered features for near and far-field automatic speech recognition
APSIPA ASC 2021, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 14-17 December 2021, Tokyo, Japan
© 2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/6804