It is well known that the peaks in log Mel-filter bank spectrum are important cues in characterizing the speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. Mel-cepstrum modulation spectrum  is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.
On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition
Research report RR-04-119
PERMALINK : https://www.eurecom.fr/publication/1513