Bayesian Inference with Shaped Deep Non-linear MLPs

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work investigates the Bayesian predictive behavior of deep nonlinear MLPs in the joint asymptotic regime where input dimension, network width, depth, and training sample size simultaneously diverge to infinity, highlighting discrepancies arising from different limiting orders. Leveraging a Neural Covariance Stochastic Differential Equation (SDE) framework combined with high-dimensional asymptotic analysis, the approach accommodates both smooth and ReLU activation functions under arbitrary temperature settings. The study introduces a novel perspective centered on the effective depth $L P / N$, yielding the first criterion to determine whether a given data-generating process benefits from increased model depth. Furthermore, it establishes that, within this approximation, the Bayesian posterior predictive distribution is equivalent to a data-dependent kernel method—offering a compact analytical form that elucidates how depth modulates model evidence.

📝 Abstract

A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training samples ($P$), the input dimension ($N_0$), the hidden layer width ($N$), and the number of hidden layers ($L$) can all be large. We build on the Neural Covariance SDE (Li et al., 2022) to analyze predictive posteriors in the regime where $LP/N\inΘ(1)$, playing the role of an effective network depth. Our framework covers both smooth and ReLU activation functions and applies to arbitrary temperature. We find to first order in $LP/N$ a simple criterion for which data generating processes benefit from depth in the sense that larger $LP/N$ increases the Bayesian model evidence. We also give a novel derivation of a prior result from the physics literature that at least to first order in $LP/N$, the Bayesian predictive posterior is remarkably simple and is simply equivalent to that of a data-dependent kernel method.

Problem

Research questions and friction points this paper is trying to address.

Bayesian inference

deep learning theory

large-scale regime

neural networks

model evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian inference

deep MLPs

Neural Covariance SDE