🤖 AI Summary
Bayesian neural networks (BNNs) suffer from uninterpretable parameter priors, while Gaussian processes (GPs) face scalability limitations on large datasets. Method: We propose the Mercer prior—a novel, interpretable prior over BNN parameters—constructed directly in the parameter space via the Mercer decomposition of a user-specified covariance kernel. Unlike existing approaches, it imposes no architectural constraints and integrates seamlessly with standard deep learning training pipelines through Bayesian inference, ensuring that the BNN’s posterior predictive distribution approximates a target GP. Contribution/Results: Experiments demonstrate that the Mercer prior significantly enhances prior interpretability—endowing priors with explicit semantic meaning tied to output behavior—while preserving strong uncertainty quantification. Moreover, it enables efficient large-scale training by inheriting BNNs’ computational scalability without sacrificing GP-like semantic clarity, thereby unifying the interpretability of GPs with the scalability of BNNs.
📝 Abstract
Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing a Bayesian posterior distribution over the network parameters. However, the prior, which is of key importance in any Bayesian setting, is rarely meaningful for BNNs. This is because the complexity of the input-to-output map of a BNN makes it difficult to understand how certain distributions enforce any interpretable constraint on the output space. Gaussian processes (GPs), on the other hand, are often preferred in uncertainty quantification tasks due to their interpretability. The drawback is that GPs are limited to small datasets without advanced techniques, which often rely on the covariance kernel having a specific structure. To address these challenges, we introduce a new class of priors for BNNs, called Mercer priors, such that the resulting BNN has samples which approximate that of a specified GP. The method works by defining a prior directly over the network parameters from the Mercer representation of the covariance kernel, and does not rely on the network having a specific structure. In doing so, we can exploit the scalability of BNNs in a meaningful Bayesian way.