Quantitative convergence of trained single layer neural networks to Gaussian processes

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work quantifies the convergence of the output distribution of shallow neural networks under gradient descent training to its corresponding Gaussian process (GP) in the infinite-width limit. Specifically, it establishes, for the first time, an explicit upper bound on the quadratic Wasserstein distance between the network’s output distribution and the GP at any training time $t geq 0$, revealing a polynomial decay rate of $O(m^{-1/2} + d^{1/2}m^{-1/2})$ in network width $m$ and input dimension $d$. Methodologically, the analysis integrates gradient descent dynamics, probability metric theory, and asymptotic expansions in the infinite-width regime—overcoming prior limitations that restricted analysis to initialization or equilibrium. The result provides a precise finite-width error characterization, substantially strengthening the theoretical foundation of neural–Gaussian process equivalence within the neural tangent kernel (NTK) framework.

Technology Category

Application Category

📝 Abstract
In this paper, we study the quantitative convergence of shallow neural networks trained via gradient descent to their associated Gaussian processes in the infinite-width limit. While previous work has established qualitative convergence under broad settings, precise, finite-width estimates remain limited, particularly during training. We provide explicit upper bounds on the quadratic Wasserstein distance between the network output and its Gaussian approximation at any training time $t ge 0$, demonstrating polynomial decay with network width. Our results quantify how architectural parameters, such as width and input dimension, influence convergence, and how training dynamics affect the approximation error.
Problem

Research questions and friction points this paper is trying to address.

Quantify convergence of shallow neural networks to Gaussian processes
Establish explicit bounds on approximation error during training
Analyze how architectural parameters affect convergence rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantitative convergence bounds for neural networks
Polynomial decay with network width demonstrated
Training dynamics impact on approximation error quantified
🔎 Similar Papers
No similar papers found.