🤖 AI Summary
Traditional preference-based Bayesian optimization (PBO) relies on Gaussian processes, whose non-conjugate likelihoods entail expensive per-iteration inference, hindering real-time human-in-the-loop interaction. To address this, we propose the first fully amortized PBO framework: it jointly models the latent objective function and acquisition policy via meta-learning; introduces a Transformer-based neural process architecture tailored for preference learning; and trains the model end-to-end using reinforcement learning combined with custom auxiliary losses to enable efficient amortized inference. Evaluated on synthetic and real-world benchmarks, our method achieves 10–1000× speedup over Gaussian process-based PBO while attaining higher convergence accuracy in most settings. This substantial improvement in both computational efficiency and optimization performance significantly enhances the practicality of PBO for interactive black-box optimization.
📝 Abstract
Preferential Bayesian Optimization (PBO) is a sample-efficient method to learn latent user utilities from preferential feedback over a pair of designs. It relies on a statistical surrogate model for the latent function, usually a Gaussian process, and an acquisition strategy to select the next candidate pair to get user feedback on. Due to the non-conjugacy of the associated likelihood, every PBO step requires a significant amount of computations with various approximate inference techniques. This computational overhead is incompatible with the way humans interact with computers, hindering the use of PBO in real-world cases. Building on the recent advances of amortized BO, we propose to circumvent this issue by fully amortizing PBO, meta-learning both the surrogate and the acquisition function. Our method comprises a novel transformer neural process architecture, trained using reinforcement learning and tailored auxiliary losses. On a benchmark composed of synthetic and real-world datasets, our method is several orders of magnitude faster than the usual Gaussian process-based strategies and often outperforms them in accuracy.