🤖 AI Summary
Variational mean-field approximations suffer from training instability, poor predictive accuracy, and miscalibrated uncertainty in overparameterized deep neural networks due to their neglect of parameter correlations. To address this, we propose a projection-based deep variational inference framework that constructs a fully correlated posterior approximation, explicitly capturing overparameterized structure. We introduce two orthogonal linear subspaces—one characterizing in-distribution functional variation and the other out-of-distribution variation—enabling principled separation of learning dynamics. Our variational family features interpretable hyperparameters and integrates reparameterization with ELBO optimization for scalable, efficient inference. Theoretically grounded and computationally tractable, our method consistently outperforms state-of-the-art baselines across multiple tasks, architectures, and datasets. It achieves new state-of-the-art performance in both predictive accuracy and uncertainty calibration, demonstrating the practical viability of Bayesian deep learning.
📝 Abstract
Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.