Langevin Monte-Carlo Provably Learns Depth Two Neural Nets at Any Size and Data

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the non-asymptotic convergence of Langevin Monte Carlo (LMC) for learning two-layer neural networks. We consider networks of arbitrary width and arbitrary data distributions, and introduce a constant Frobenius-norm regularization—scale-invariant with respect to network size—to verify the Villani condition and derive a Poincaré inequality without imposing constraints on network architecture or data structure. Our theoretical analysis leverages smooth activation functions and employs total variation (TV) distance and q-Rényi divergence to rigorously establish non-asymptotic convergence rates of the LMC iterates to the target Gibbs distribution in both TV and q-Rényi metrics. The results uniformly apply to both classification and regression tasks, eliminating prior assumptions on network width or data distribution. This work provides a foundational advance in the theoretical understanding of MCMC methods for deep learning.

Technology Category

Application Category

📝 Abstract
In this work, we will establish that the Langevin Monte-Carlo algorithm can learn depth-2 neural nets of any size and for any data and we give non-asymptotic convergence rates for it. We achieve this via showing that under Total Variation distance and q-Renyi divergence, the iterates of Langevin Monte Carlo converge to the Gibbs distribution of Frobenius norm regularized losses for any of these nets, when using smooth activations and in both classification and regression settings. Most critically, the amount of regularization needed for our results is independent of the size of the net. The key observation of ours is that two layer neural loss functions can always be regularized by a constant amount such that they satisfy the Villani conditions, and thus their Gibbs measures satisfy a Poincare inequality.
Problem

Research questions and friction points this paper is trying to address.

Langevin Monte-Carlo learns depth-2 neural nets.
Convergence rates for any size and data.
Regularization independent of network size.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Langevin Monte-Carlo learns depth-2 neural nets
Convergence under Total Variation and Renyi divergence
Regularization independent of neural network size
🔎 Similar Papers
No similar papers found.