🤖 AI Summary
This work addresses the inaccuracy of lower-tail predictions in Gaussian processes (GPs) under Bayesian optimization, which arises from kernel and hyperparameter choices and undermines acquisition functions such as expected improvement. Focusing on the reliability of lower-tail predictions in noiseless settings, the paper introduces a target-oriented tail calibration framework featuring two novel concepts: spatial occurrence calibration and threshold μ-calibration. This framework establishes theoretical guarantees for predictive reliability in low-threshold regions and yields a post-processing method, termed tcGP, to enhance calibration quality. Empirical evaluations on standard benchmarks demonstrate that tcGP significantly outperforms both standard and globally calibrated GPs, improving lower-tail prediction accuracy, boosting Bayesian optimization performance, and ensuring that calibrated sampling points remain dense across the design space.
📝 Abstract
Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $μ$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.