Entropic bounds for conditionally Gaussian vectors and applications to neural networks

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the convergence rate of randomly initialized fully connected neural networks—and their derivatives—toward Gaussian processes over finite input sets, as well as the total variation (TV) distance between the Bayesian posterior distribution and its Gaussian limit. Methodologically, it introduces a novel synthesis of quantitative cumulant bounds and information-theoretic entropy inequalities to derive new upper bounds on the TV and 2-Wasserstein distances between conditional Gaussian vectors and standard Gaussians. Theoretical contributions are threefold: (1) under mild activation function assumptions, it establishes optimal convergence rates for both network outputs and gradients toward Gaussianity; (2) it provides the first quantitative central limit theorem for Bayesian posteriors in the TV metric, with explicit, non-asymptotic error bounds; and (3) it unifies, improves upon, and extends state-of-the-art results on Gaussian approximations and Bayesian asymptotics in deep learning.

Technology Category

Application Category

📝 Abstract
Using entropic inequalities from information theory, we provide new bounds on the total variation and 2-Wasserstein distances between a conditionally Gaussian law and a Gaussian law with invertible covariance matrix. We apply our results to quantify the speed of convergence to Gaussian of a randomly initialized fully connected neural network and its derivatives - evaluated in a finite number of inputs - when the initialization is Gaussian and the sizes of the inner layers diverge to infinity. Our results require mild assumptions on the activation function, and allow one to recover optimal rates of convergence in a variety of distances, thus improving and extending the findings of Basteri and Trevisan (2023), Favaro et al. (2023), Trevisan (2024) and Apollonio et al. (2024). One of our main tools are the quantitative cumulant estimates established in Hanin (2024). As an illustration, we apply our results to bound the total variation distance between the Bayesian posterior law of the neural network and its derivatives, and the posterior law of the corresponding Gaussian limit: this yields quantitative versions of a posterior CLT by Hron et al. (2022), and extends several estimates by Trevisan (2024) to the total variation metric.
Problem

Research questions and friction points this paper is trying to address.

Bounds on distances between conditionally Gaussian and Gaussian laws
Quantify Gaussian convergence speed in neural networks
Bound posterior law distances in Bayesian neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses entropic bounds for Gaussian vectors
Quantifies neural network convergence speed
Applies Bayesian posterior law analysis
🔎 Similar Papers
No similar papers found.