Comparative study of different features on OLLO logatome recognition task

Tyagi, Vivek; BenZeghiba, Mohamed Faouzi; Cernak, Milos; Wellekens, Christian J
SRIV 2006, ITRW on Speech Recognition and Intrinsic Variation, May 20, 2006, Toulouse, France

We compare the ASR performances of different features sets (MFCC, PLP, constant JRASTA PLP and variable scale piece-wise quasi-stationary analyzed MFCC features [1]) on the OLdenburg LOgatome speech corpus (OLLO)[2]. OLLO database is rich in various speech variabilities such as different speaking styles (slow, fast, statement, questioning, loud and soft) and with almost equal sampling of the male and female speakers. A HMM-GMM system has been trained on the no-accent part of the OLLO database that consists of roughly 13,500 utterances and then tested on the no-accent part of the test set that roughly consists of 13,800 utterances. Each of these utterances correspond to a logatome. We compare state-of the art fixed time scale (20ms long windows) features with the recently proposed variable scale quasi-stationary analyzed[1] MFCC features This technique results in a variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. The speech recognition experiments on the OLLO database, show that the proposed variable-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to MFCC, PLP and constant-JRASTA PLP features.


Type:
Conférence
City:
Toulouse
Date:
2006-05-20
Department:
Sécurité numérique
Eurecom Ref:
2055
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in SRIV 2006, ITRW on Speech Recognition and Intrinsic Variation, May 20, 2006, Toulouse, France and is available at :

PERMALINK : https://www.eurecom.fr/publication/2055