On multi-scale piecewise stationary spectral analysis of speech signals for robust ASR

Tyagi, Vivek;Wellekens, Christian J
Research report RR-04-118

A fixed scale (typically 25ms) short time spectral analysis of speech signals, which are inherently multi-scale in nature [7] (typically vowels last for 40-80ms while stops last for 10-20ms), is clearly sub-optimal for time-frequency resolution. In this work, we detect piecewise quasi-stationary speech segments based on the likelihood of that segment which in turn is estimated from the linear prediction (LP) residual error. A window size equal in length to that of the detected quasistationary segment is used to obtain its spectral estimate. Such an approach adaptively chooses the largest possible window size such that the signal remains quasistationary within this window and excludes the adjoining quasi-stationary segments from this window. In experiments, it is shown that the proposed multi-scale piecewise stationary spectral analysis based features improve recognition accuracy in clean conditions when compared directly to features based on £xed scale spectral analysis.

Sécurité numérique
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research report RR-04-118 and is available at :

PERMALINK : https://www.eurecom.fr/publication/1514