On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition

Tyagi, Vivek;Wellekens, Christian J
Research report RR-04-119

It is well known that the peaks in log Mel-filter bank spectrum are important cues in characterizing the speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. Mel-cepstrum modulation spectrum [3] is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.


Type:
Rapport
Date:
2004-09-20
Department:
Sécurité numérique
Eurecom Ref:
1513
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research report RR-04-119 and is available at :

PERMALINK : https://www.eurecom.fr/publication/1513