Advances of deep Gaussian processes: Calibration and sparsification

Tran, Gia-Lac

Gaussian Processes (GPs) are an attractive way of doing non-parametric Bayesian modeling in a supervised learning problem. It is well-known that GPs are able to make inferences as well as predictive uncertainties with a firm mathematical background. However, GPs are often unfavorable by the practitioners due to their kernel's expressiveness and the computational requirements.

In order to enhance the representational power of kernel function, integration of (convolutional) neural networks and GPs are a promising solution. As our first contribution, we empirically show that these combinations are miscalibrated, which leads to over-confident predictions. We also propose a novel well-calibrated solution to merge neural structures and GPs by using random features and variational inference techniques. In addition, the framework can be intuitively extended to reduce the computational cost by using structural random features. Our proposal not only outperforms prior combinations in terms of calibration but also reach state-of-the-art results on image classification tasks.

In terms of computational cost, the exact Gaussian Processes require the cubic complexity to training size. Inducing point-based Gaussian Processes are a common choice to mitigate the bottleneck by selecting a small set of active points through a global distillation from available observations. However, the general case remains elusive and it is still possible that the required number of active points may exceed a certain computational budget. In our second study, we propose Sparse-within-Sparse Gaussian Processes which enable the approximation with a large number of inducing points without suffering a prohibitive computational cost. We perform an extensive experimental validation that demonstrates the effectiveness of our approach compared to the state-of-the-art GPs-based methods.

Data Science
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: