Variational Bayesian methods for audio indexing

Valente, Fabio

Model selection is a main issue in many machine learning problems. In different real data application an hypothesis on the model is done before proceeding with the learning task. If the hypothesized model does not respect the structure of experimental data, the effectiveness of the learning can be strongly affected. Here the need comes for techniques that can select the model that best fit to data. The probabilistic framework is a largely used for model selection. It considers probabilities over different models and assume that the best model is the one that maximizes model probability given the observed data i.e. given a model m and an observation data set D, best model maximizes P(m|D). Estimation of model probability can be parametric or non-parametric; in many real data problems, the assumption of parametric model is often used because of tractability issues. In those cases model probability estimation can be obtained marginalizing all model parameters. Depending on the model complexity, integration cannot always be done in close form and approximated techniques must be considered instead. The most used approximations are sometimes inappropriate according to the considered application and need heuristic tuning to be effective. In this thesis we discuss a new type of approximated methods called Variational Learning (a.k.a. Ensemble Learning) that allow an approximated close form solution to the parameter integration problem. The key of Variational methods is the replacement of real unknown parameter distributions with approximated distributions (Variational distributions) that permits an analytical tractability of the solution. Obviously the effectiveness of this approach depends on how close the approximated distributions are to real distributions. In this thesis we investigate the use of Variational techniques in an audio indexing task in which model selection is a main problem. Audio indexing problems in fact consists in clustering together part of the audio file with the same acoustic characteristic. In particular we consider here the case in which data coming from the same speaker must be clustered. The model selection problem is a central issue in those applications because the cluster (speaker) number is generally not a priori known and must be estimated by data. The most popular approach to this problem uses a very primitive approximation of the integral for the model selection task that is actually true only asymptotically. In order to obtain reasonable results in the limited data case, an heuristic adjustment of the model selection criterion is done. It often gives serious tuning problems and final result is strongly affected by the tuning. Variational methods do not need any heuristic tuning and are not based on large data limits for this reason they turn out to be more efficient than the BIC.

Digital Security
Eurecom Ref:
© EPFL. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: