More than three decades of speech recognition research resulted in a very sophisticated statistical framework. However, less attention was still devoted to diagnostics of speech recognition; most previous research report on results in terms of ever-lower WER in various intrinsic or environmental conditions. This paper presents a diagnostics of the decoding process of ASR systems. The purpose of our diagnostics is to go beyond standard evaluation in terms of WERs and confusion matrices, and to look at the recognized output in more details. During the decoding phase, some specific data are collected at the decoder as possible causes of errors, and later are statistically analyzed using classification and regression trees. Focusing on pure acoustic phone decoding without language modeling, we present and discuss the results of the diagnostics that is used for an analysis of impact of intrinsic speech variabilities on speech recognition.
Diagnostics of speech recognition using classification phoneme diagnostic trees
CI 2006, 2nd IASTED International Conference on computational intelligence, Special Session on NLP, November 20-22, 2006 San Francisco, USA
© IASTED. Personal use of this material is permitted. The definitive version of this paper was published in CI 2006, 2nd IASTED International Conference on computational intelligence, Special Session on NLP, November 20-22, 2006 San Francisco, USA and is available at : http://www.actapress.com/Abstract.aspx?paperId=29117
PERMALINK : https://www.eurecom.fr/publication/2021