In this paper we aim to investigate the use of Variational Bayesian methods for audio indexing purposes. Variational Bayesian (VB) techniques are approximated techniques for fully Bayesian learning. Contrarily to non Bayesian methods (e.g. Maximum Likelihood) or partially Bayesian criterion (e.g. Maximum a Posteriori), VB benefits from important model selection properties. VB learning is based on the Free Energy optimization; Free Energy can be used at the same time as an objective function and as a model selection criterion allowing simultaneous model learning/model selection. Here we explore the use of VB learning and VB model selection in a speaker clustering task comparing results with classical learning techniques (ML and MAP) and classical model selection criteria (BIC). Experiments are run on the evaluation data set NIST-1996 HUB-4 and results show that VB can outperform classical methods.
Variational Bayesian methods for audio indexing
MLMI 2005, 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, July 11-13, 2005, Edinburgh, UK / Also published in LNCS Volume 3869/2006
© Springer. Personal use of this material is permitted. The definitive version of this paper was published in MLMI 2005, 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, July 11-13, 2005, Edinburgh, UK / Also published in LNCS Volume 3869/2006 and is available at : http://dx.doi.org/10.1007/11677482_27
PERMALINK : https://www.eurecom.fr/publication/1692