Bayesian neural networks with interpretable priors from Mercer kernels

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Bayesian neural networks (BNNs) suffer from uninterpretable parameter priors, while Gaussian processes (GPs) face scalability limitations on large datasets. Method: We propose the Mercer prior—a novel, interpretable prior over BNN parameters—constructed directly in the parameter space via the Mercer decomposition of a user-specified covariance kernel. Unlike existing approaches, it imposes no architectural constraints and integrates seamlessly with standard deep learning training pipelines through Bayesian inference, ensuring that the BNN’s posterior predictive distribution approximates a target GP. Contribution/Results: Experiments demonstrate that the Mercer prior significantly enhances prior interpretability—endowing priors with explicit semantic meaning tied to output behavior—while preserving strong uncertainty quantification. Moreover, it enables efficient large-scale training by inheriting BNNs’ computational scalability without sacrificing GP-like semantic clarity, thereby unifying the interpretability of GPs with the scalability of BNNs.

Technology Category

Application Category

📝 Abstract

Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing a Bayesian posterior distribution over the network parameters. However, the prior, which is of key importance in any Bayesian setting, is rarely meaningful for BNNs. This is because the complexity of the input-to-output map of a BNN makes it difficult to understand how certain distributions enforce any interpretable constraint on the output space. Gaussian processes (GPs), on the other hand, are often preferred in uncertainty quantification tasks due to their interpretability. The drawback is that GPs are limited to small datasets without advanced techniques, which often rely on the covariance kernel having a specific structure. To address these challenges, we introduce a new class of priors for BNNs, called Mercer priors, such that the resulting BNN has samples which approximate that of a specified GP. The method works by defining a prior directly over the network parameters from the Mercer representation of the covariance kernel, and does not rely on the network having a specific structure. In doing so, we can exploit the scalability of BNNs in a meaningful Bayesian way.

Problem

Research questions and friction points this paper is trying to address.

BNNs lack interpretable priors for meaningful uncertainty quantification

Gaussian processes are interpretable but limited to small datasets

Mercer priors enable BNNs to approximate Gaussian process behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

BNNs use Mercer kernel priors for interpretability

Priors approximate Gaussian process behavior in networks

Method enables scalable Bayesian learning with meaningful uncertainty

🔎 Similar Papers

No similar papers found.