🤖 AI Summary
This paper investigates the non-asymptotic convergence of Langevin Monte Carlo (LMC) for learning two-layer neural networks. We consider networks of arbitrary width and arbitrary data distributions, and introduce a constant Frobenius-norm regularization—scale-invariant with respect to network size—to verify the Villani condition and derive a Poincaré inequality without imposing constraints on network architecture or data structure. Our theoretical analysis leverages smooth activation functions and employs total variation (TV) distance and q-Rényi divergence to rigorously establish non-asymptotic convergence rates of the LMC iterates to the target Gibbs distribution in both TV and q-Rényi metrics. The results uniformly apply to both classification and regression tasks, eliminating prior assumptions on network width or data distribution. This work provides a foundational advance in the theoretical understanding of MCMC methods for deep learning.
📝 Abstract
In this work, we will establish that the Langevin Monte-Carlo algorithm can learn depth-2 neural nets of any size and for any data and we give non-asymptotic convergence rates for it. We achieve this via showing that under Total Variation distance and q-Renyi divergence, the iterates of Langevin Monte Carlo converge to the Gibbs distribution of Frobenius norm regularized losses for any of these nets, when using smooth activations and in both classification and regression settings. Most critically, the amount of regularization needed for our results is independent of the size of the net. The key observation of ours is that two layer neural loss functions can always be regularized by a constant amount such that they satisfy the Villani conditions, and thus their Gibbs measures satisfy a Poincare inequality.