In this work, we revisit the theoretical properties of Hamiltonian stochastic differential equations (SDEs) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. We consider overlooked results describing the ergodic convergence rates of numerical integration schemes, and we produce a novel analysis for the effect of mini-batches through the lens of differential operator splitting. In our analysis, the stochastic component of the proposed Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This allows us to derive interesting connections among different sampling schemes, including the original Hamiltonian Monte Carlo (HMC) algorithm, and explain their performance. We show that for a careful selection of numerical integrators, both errors vanish at a rate O(η 2 ), where η is the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.
A unified view of stochastic hamiltonian sampling
Submitted to ArXiV, 30 June 2021
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Submitted to ArXiV, 30 June 2021 and is available at :
PERMALINK : https://www.eurecom.fr/publication/6601