🤖 AI Summary
This paper addresses the calibration of predictive distributions for Gaussian processes (GPs) under interpolation settings, formally defining μ-coverage and μ-probabilistic calibration via the randomized probability integral transform (RPIT) from a design-marginal perspective. We propose two novel methods: CPS-GP (Conformalized Predictive Smoothing GP), which achieves finite-sample marginal calibration, and BCR-GP (Bayesian-Constrained Residual GP), which yields smooth, sharp, and tail-controlled predictive distributions. Technically, both methods integrate leave-one-out residual standardization, generalized normal distribution modeling, cross-validated residual fitting, and Kolmogorov–Smirnov testing. Experiments demonstrate that CPS-GP and BCR-GP significantly outperform Jackknife+ and full-conformal GP in calibration metrics—including empirical coverage, KS statistic, and integrated absolute error—as well as in accuracy, measured by scaled continuous ranked probability score (CRPS). These advances provide a more reliable foundation for uncertainty quantification in applications such as sequential Bayesian optimization.
📝 Abstract
We study the calibration of Gaussian process (GP) predictive distributions in the interpolation setting from a design-marginal perspective. Conditioning on the data and averaging over a design measure μ, we formalize μ-coverage for central intervals and μ-probabilistic calibration through randomized probability integral transforms. We introduce two methods. cps-gp adapts conformal predictive systems to GP interpolation using standardized leave-one-out residuals, yielding stepwise predictive distributions with finite-sample marginal calibration. bcr-gp retains the GP posterior mean and replaces the Gaussian residual by a generalized normal model fitted to cross-validated standardized residuals. A Bayesian selection rule-based either on a posterior upper quantile of the variance for conservative prediction or on a cross-posterior Kolmogorov-Smirnov criterion for probabilistic calibration-controls dispersion and tail behavior while producing smooth predictive distributions suitable for sequential design. Numerical experiments on benchmark functions compare cps-gp, bcr-gp, Jackknife+ for GPs, and the full conformal Gaussian process, using calibration metrics (coverage, Kolmogorov-Smirnov, integral absolute error) and accuracy or sharpness through the scaled continuous ranked probability score.