Ecole d'ingénieur et centre de recherche en Sciences du numérique

Parallel and hierarchical decision making for sparse coding in speech recognition

Wang, Dong; Vipperla, Ravichander; Evans, Nicholas

INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication, August 28-31, Florence, Italy

  Sparse coding exhibits promising performance in speech processing, mainly due to the large number of bases that can be used to represent speech signals. However, the high demand for computational power represents a major obstacle in the case of large datasets, as does the difficulty in utilising information scattered sparsely in high dimensional features. This paper reports the use of an online dictionary learning technique, proposed recently by the machine learning community, to learn large scale bases efficiently, and proposes a new parallel and hierarchical architecture to make use of the sparse information in high dimensional features. The approach uses multilayer perceptrons (MLPs) to model sparse feature subspaces and make local decisions accordingly; the latter are integrated by additional MLPs in a hierarchical way for making global decisions. Experiments on the WSJ database show that the proposed approach not only solves the problem of prohibitive computation with large-dimensional sparse features, but also provides better performance in a frame-level phone prediction task.                                               Sparse coding exhibits promising performance in speech processing, mainly due to the large number of bases that can be used to represent speech signals. However, the high demand for computational power represents a major obstacle in the case of large datasets, as does the difficulty in utilising information scattered sparsely in high dimensional features. This paper reports the use of an online dictionary learning technique, proposed recently by the machine learning community, to learn large scale bases efficiently, and proposes a new parallel and hierarchical architecture to make use of the sparse information in high dimensional features. The approach uses multilayer perceptrons (MLPs) to model sparse feature subspaces and make local decisions accordingly; the latter are integrated by additional MLPs in a hierarchical way for making global decisions. Experiments on the WSJ database show that the proposed approach not only solves the problem of prohibitive computation with large-dimensional sparse features, but also provides better performance in a frame-level phone prediction task.

Document Bibtex

Titre:Parallel and hierarchical decision making for sparse coding in speech recognition
Mots Clés:sparse coding, feature extraction, posterior feature, speech recognition
Type:Conférence
Langue:English
Ville:Florence
Pays:ITALIE
Date:
Département:Sécurité numérique
Eurecom ref:3410
Copyright: © ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication, August 28-31, Florence, Italy and is available at :
Bibtex: @inproceedings{EURECOM+3410, year = {2011}, title = {{P}arallel and hierarchical decision making for sparse coding in speech recognition}, author = {{W}ang, {D}ong and {V}ipperla, {R}avichander and {E}vans, {N}icholas}, booktitle = {{INTERSPEECH} 2011, 12th {A}nnual {C}onference of the {I}nternational {S}peech {C}ommunication, {A}ugust 28-31, {F}lorence, {I}taly}, address = {{F}lorence, {ITALIE}}, month = {08}, url = {http://www.eurecom.fr/publication/3410} }
Voir aussi: