Dong Wang, Ravichander Vipperla, Nicholas Evans and Thomas Fang Zheng
Research Report RR-11-261
Abstract: The unsupervised learning of spectro-temporal patterns within speech signals is of interest in a broad range of applications. Where patterns are non-negative and convolutive in nature, relevant learning algorithms include convolutive non-negative matrix factorization (CNMF) and its sparse alternative, convolutive non-negative sparse coding (CNSC). Both algorithms, however, place unrealistic demands on computing power and memory which prohibit their application in large scale tasks. This paper proposes a new online implementation of CNMF and CNSC which processes input data piece-by-piece and updates learned patterns gradually with accumulated statistics. The proposed approach facilitates pattern learning with huge volumes of training data that are beyond the capability of existing alternatives. We show that the new online learning algorithm almost surely converges to the same cost value as the standard batch learning approach when both computing resources and data are unlimited and that it outperform batch learning in two experiments with practical computing resources and data quantities.