Least squares filtering of speech signals for robust ASR

Tyagi, Vivek; Wellekens, Christian J; Slock, Dirk T M
Speech Communication, Volume 48, Issue 11, November 2006

The behavior of the least squares filter (LeSF) is analyzed for a class of nonstationary signals that are either (a) composed of multiple sinusoids (voiced speech) whose frequencies, phases and the amplitudes may vary from block to block or, (b) are output of an all-pole filter excited by white noise input (unvoiced speech segments) and which are embedded in white noise. In this work, analytic expressions for the weights and the output of the LeSF are derived as a function of the block length and the signal SNR computed over the corresponding block. We have used LeSF filter estimated on each block to enhance the speech signals embedded in white noise as well as other realistic noises such as factory noise and an aircraft cockpit noise. Automatic speech recognition (ASR) experiments on a connected numbers task, OGI Numbers95[29] show that the proposed LeSF based features provide a significant improvement in speech recognition accuracies in various non-stationary noise conditions when compared directly to the un-enhanced speech, spectral subtraction and noise robust CJ-RASTA-PLP features.

Digital Security
Eurecom Ref:
© Elsevier. Personal use of this material is permitted. The definitive version of this paper was published in Speech Communication, Volume 48, Issue 11, November 2006 and is available at : http://dx.doi.org/10.1016/j.specom.2006.07.010

PERMALINK : https://www.eurecom.fr/publication/2004