🤖 AI Summary
This work addresses the fundamental trade-off between bias and computational efficiency in Bayesian posterior mean estimation. We propose the first unbiased estimator based on kinetic Langevin dynamics, possessing finite variance and satisfying the central limit theorem. Our method integrates high-order splitting integrators, multilevel Monte Carlo, and hierarchical chain coupling, and accommodates inexact (stochastic or approximate) gradients without requiring Metropolis–Hastings correction or thermalization. Theoretically, we establish an optimal gradient complexity of $O(d^{1/4}varepsilon^{-2})$, dimension-free variance under product distributions, and computational cost independent of dataset size. Empirically, on MNIST multiclass classification and Poisson regression for football score prediction, the estimator achieves constant gradient evaluations per effective sample—outperforming randomized Hamiltonian Monte Carlo by two to three orders of magnitude in speed.
📝 Abstract
We present an unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. It can achieve accuracy $epsilon>0$ for estimating expectations of Lipschitz functions in $d$ dimensions with $mathcal{O}(d^{1/4}epsilon^{-2})$ expected gradient evaluations, without assuming warm start. We exhibit similar bounds using both approximate and stochastic gradients, and our method's computational cost is shown to scale independently of the size of the dataset. The proposed method is tested using a multinomial regression problem on the MNIST dataset and a Poisson regression model for soccer scores. Experiments indicate that the number of gradient evaluations per effective sample is independent of dimension, even when using inexact gradients. For product distributions, we give dimension-independent variance bounds. Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the ``gold-standard"randomized Hamiltonian Monte Carlo.