🤖 AI Summary
Existing Gaussian process limits, such as the Neural Network Gaussian Process (NNGP), fail to capture rare yet dominant non-Gaussian fluctuations and their associated feature learning mechanisms in the posterior of wide Bayesian neural networks. This work introduces large deviation theory to this domain for the first time, formulating a rate function at the level of predictive functions as a variational objective. It proposes a novel framework that jointly optimizes the predictor and a data-dependent kernel, thereby moving beyond the conventional assumption of a fixed kernel. This approach enables a functional-level characterization of posterior non-Gaussianity, accurately accounts for finite-width effects in moderately wide networks, and successfully captures key phenomena such as posterior deformation, non-Gaussian tails, and adaptive kernel selection.
📝 Abstract
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.