🤖 AI Summary
This work addresses the high computational cost of uncertainty quantification in neural network predictions by proposing a novel method that directly estimates predictive uncertainty without requiring multiple training runs. By analyzing the limiting fluctuation process of wide two-layer neural networks trained via stochastic gradient descent, the study establishes—for the first time—an explicit characterization of the distribution of this limiting process: it is shown to be a centered Gaussian process in the dual of a weighted Sobolev space, with a closed-form expression for its finite-dimensional covariance. The theoretical derivation integrates the trajectory central limit theorem, weak solutions of linear stochastic evolution equations, backward transport equations, and analysis of nonlocal source terms, culminating in an analytical expression for the asymptotic variance of network output fluctuations, which is numerically validated on one-dimensional regression tasks.
📝 Abstract
Uncertainty quantification in neural networks prediction is a main issue for usual applications. Our approach seeks at reducing computation costs by directly evaluating uncertainty using PDE's information on the asymptotic variance, rather than the deep ensemble method which may be seen as a Monte Carlo estimation of the prediction, requiring the training of multiple networks. We thus study the law of the limiting process describing the random fluctuations around the mean-field limit of wide two-layer neural networks trained by stochastic gradient descent in a weak-noise regime. Building on a recent trajectorial central limit theorem, in which this limit is characterized as the weak solution of a linear stochastic evolution equation, we identify its law explicitly. More precisely, we show that it is a centered Gaussian process in the dual of a weighted Sobolev space, and we derive a closed covariance representation for the finite-dimensional distributions obtained by testing it against smooth functions. This covariance is expressed through the solution of a backward transport equation with a nonlocal source term, whose coefficients are driven by the mean-field trajectory. As a consequence, by testing against the activation function at a fixed input, we obtain an expression for the limiting variance of the corresponding network-output fluctuations. We illustrate this result numerically on a one-dimensional regression example.