Evaluating Uncertainty in Deep Gaussian Processes

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work systematically evaluates Deep Gaussian Processes (DGPs) and Deep Sigma-Point Processes (DSPPs) for uncertainty calibration and robustness under distributional shift, benchmarking against Deep Ensembles and other baselines. It is the first empirical study to analyze the calibration–robustness trade-off between DGPs and DSPPs within a unified framework, employing Bayesian hierarchical modeling, sigma-point approximations, and quantitative metrics—Negative Log-Likelihood (NLL) and Expected Calibration Error (ECE)—alongside synthetically induced feature-level distribution shifts. Results show that DSPPs achieve superior in-distribution regression performance on CASP and ESR benchmarks (NLL ↓12%, ECE ↓35% vs. baselines), yet suffer severe degradation in both calibration and predictive accuracy under distributional shift. In contrast, Deep Ensembles demonstrate more balanced robustness across diverse shift scenarios. The study delineates the operational boundaries of DSPPs’ calibration advantage and provides critical empirical guidance for selecting uncertainty-aware models in trustworthy deep learning.

Technology Category

Application Category

📝 Abstract

Reliable uncertainty estimates are crucial in modern machine learning. Deep Gaussian Processes (DGPs) and Deep Sigma Point Processes (DSPPs) extend GPs hierarchically, offering promising methods for uncertainty quantification grounded in Bayesian principles. However, their empirical calibration and robustness under distribution shift relative to baselines like Deep Ensembles remain understudied. This work evaluates these models on regression (CASP dataset) and classification (ESR dataset) tasks, assessing predictive performance (MAE, Accu- racy), calibration using Negative Log-Likelihood (NLL) and Expected Calibration Error (ECE), alongside robustness under various synthetic feature-level distribution shifts. Results indicate DSPPs provide strong in-distribution calibration leveraging their sigma point approximations. However, compared to Deep Ensembles, which demonstrated superior robustness in both per- formance and calibration under the tested shifts, the GP-based methods showed vulnerabilities, exhibiting particular sensitivity in the observed metrics. Our findings underscore ensembles as a robust baseline, suggesting that while deep GP methods offer good in-distribution calibration, their practical robustness under distribution shift requires careful evaluation. To facilitate reproducibility, we make our code available at https://github.com/matthjs/xai-gp.

Problem

Research questions and friction points this paper is trying to address.

Evaluating uncertainty calibration in Deep Gaussian Processes.

Assessing robustness of GP-based methods under distribution shifts.

Comparing Deep Ensembles with DSPPs on predictive performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Gaussian Processes for uncertainty quantification

Sigma Point approximations in DSPPs

Evaluating robustness under distribution shifts

🔎 Similar Papers

No similar papers found.