Amplitude Modulation(AM) and frequency modulation (FM) have been well defined and studied in the context of communications systems. Borrowing upon these ideas, several researchers have applied AM-FM[6, 7, 8, 9] modeling for speech signals with mixed results. These techniques have varied in their definition and consequently the demodulation methods used therein. In this paper, we carefully define AM and FM signals in the context of ASR. We show that for a theoretically meaningful estimation of the AM signal, it is necessary to decompose the speech signal into several narrow spectral bands as opposed to the previous use of the speech modulation spectrum[6, 7, 8, 9], which was derived by decomposing the speech signal into increasingly wider spectral bands (such as critical, Bark or Mel). Due to the Hilbert relationships, the AM signal induces a component in the FM signal which is fully determinable from the AM signal[1, 3]. We present a novel homomorphic filtering technique to extract the leftover FM signal after suppressing the redundant part of the FM signal. The estimated AM message signals are downsampled and their lower DCT coefficients are retained as speech features. These features carry information that is complementary to the MFCCs. A Tandem combination of these two features is shown to improve recognition accuracy.
Fepstrum and carrier signal decomposition of speech signals through homomorphic filtering
ICASSP 2006, 31st International Conference on Acoustics, Speech, and Signal Processing, Special session, Dealing with intrinsic speech variabilities in ASR, Volume 5, May 14-19, 2006, Toulouse, France
© 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/1936