Thesis
Throughout the last decade, deep learning has reached a sufficient level of maturity
to become the preferred choice to solve machine learning-related problems or to
aid decision making processes. At the same time, deep learning is generally not
equipped with the ability to accurately quantify the uncertainty of its predictions,
thus making these models less suitable for risk-critical applications. A possible
solution to address this problem is to employ a Bayesian formulation; however,
while this offers an elegant treatment, it is analytically intractable and it requires
approximations. Despite the huge advancements in the last few years, there is
still a long way to make these approaches widely applicable. In this thesis, we
address some of the challenges for modern Bayesian deep learning, by proposing
and studying solutions to improve scalability and inference of these models. The
first part of the thesis is dedicated to deep models where inference is carried out
using variational inference (VI). Specifically, we study the role of initialization
of the variational parameters and we show how careful initialization strategies
can make VI deliver good performance even in large scale models. In this part
of the thesis we also study the over-regularization effect of the variational objective
on over-parametrized models. To tackle this problem, we propose an novel
parameterization based on the Walsh-Hadamard transform; not only this solves
the over-regularization effect of VI but it also allows us to model non-factorized
posteriors while keeping time and space complexity under control. The second part
of the thesis is dedicated to a study on the role of priors. While being an essential
building block of Bayes’ rule, picking good priors for deep learning models is
generally hard. For this reason, we propose two different strategies based (i) on
the functional interpretation of neural networks and (ii) on a scalable procedure
to perform model selection on the prior hyper-parameters, akin to maximization
of the marginal likelihood. To conclude this part, we analyze a different kind of
Bayesian model (Gaussian process) and we study the effect of placing a prior on all
the hyper-parameters of these models, including the additional variables required
by the inducing-point approximations. We also show how it is possible to infer
free-form posteriors on these variables, which conventionally would have been
otherwise point-estimated.
Type:
Thesis
Date:
2022-02-21
Department:
Data Science
Eurecom Ref:
6643
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: