Improving scalability and inference in probabilistic deep models

Rossi, Simone

Thesis

Throughout the last decade, deep learning has reached a sufficient level of maturity

to become the preferred choice to solve machine learning-related problems or to

aid decision making processes. At the same time, deep learning is generally not

equipped with the ability to accurately quantify the uncertainty of its predictions,

thus making these models less suitable for risk-critical applications. A possible

solution to address this problem is to employ a Bayesian formulation; however,

while this offers an elegant treatment, it is analytically intractable and it requires

approximations. Despite the huge advancements in the last few years, there is

still a long way to make these approaches widely applicable. In this thesis, we

address some of the challenges for modern Bayesian deep learning, by proposing

and studying solutions to improve scalability and inference of these models. The

first part of the thesis is dedicated to deep models where inference is carried out

using variational inference (VI). Specifically, we study the role of initialization

of the variational parameters and we show how careful initialization strategies

can make VI deliver good performance even in large scale models. In this part

of the thesis we also study the over-regularization effect of the variational objective

on over-parametrized models. To tackle this problem, we propose an novel

parameterization based on the Walsh-Hadamard transform; not only this solves

the over-regularization effect of VI but it also allows us to model non-factorized

posteriors while keeping time and space complexity under control. The second part

of the thesis is dedicated to a study on the role of priors. While being an essential

building block of Bayes’ rule, picking good priors for deep learning models is

generally hard. For this reason, we propose two different strategies based (i) on

the functional interpretation of neural networks and (ii) on a scalable procedure

to perform model selection on the prior hyper-parameters, akin to maximization

of the marginal likelihood. To conclude this part, we analyze a different kind of

Bayesian model (Gaussian process) and we study the effect of placing a prior on all

the hyper-parameters of these models, including the additional variables required

by the inducing-point approximations. We also show how it is possible to infer

free-form posteriors on these variables, which conventionally would have been

otherwise point-estimated.

Detail

Document

HAL

BIBTEX

Type:

Thesis

Date:

2022-02-21

Department:

Data Science

Eurecom Ref:

6643