Fepstrum and carrier signal decomposition of speech signals through homomorphic filtering

Tyagi, Vivek; Wellekens, Christian J
ICASSP 2006, 31st International Conference on Acoustics, Speech, and Signal Processing, Special session, Dealing with intrinsic speech variabilities in ASR, Volume 5, May 14-19, 2006, Toulouse, France

Amplitude Modulation(AM) and frequency modulation (FM) have been well defined and studied in the context of communications systems[10]. Borrowing upon these ideas, several researchers have applied AM-FM[6, 7, 8, 9] modeling for speech signals with mixed results. These techniques have varied in their definition and consequently the demodulation methods used therein. In this paper, we carefully define AM and FM signals in the context of ASR. We show that for a theoretically meaningful estimation of the AM signal, it is necessary to decompose the speech signal into several narrow spectral bands as opposed to the previous use of the speech modulation spectrum[6, 7, 8, 9], which was derived by decomposing the speech signal into increasingly wider spectral bands (such as critical, Bark or Mel). Due to the Hilbert relationships, the AM signal induces a component in the FM signal which is fully determinable from the AM signal[1, 3]. We present a novel homomorphic filtering technique to extract the leftover FM signal after suppressing the redundant part of the FM signal. The estimated AM message signals are downsampled and their lower DCT coefficients are retained as speech features. These features carry information that is complementary to the MFCCs. A Tandem[4] combination of these two features is shown to improve recognition accuracy.

