🤖 AI Summary
This work addresses the challenge of high-dimensional output regression under scarce training data, where conventional multi-output Gaussian processes (GPs) suffer from poor scalability and existing compression-prediction approaches—such as PCA-GP—rely on fixed, task-agnostic bases. To overcome these limitations, we propose Gaussian Process Latent Factor Regression (GPLFR), which models high-dimensional outputs as linear-Gaussian decodings of low-dimensional latent states governed by GP priors. By analytically marginalizing decoding weights, GPLFR enables end-to-end joint optimization of dimensionality reduction and prediction. Combining variational inference with scalable kernel methods, GPLFR substantially outperforms baseline methods like PCA-GP in global climate modeling of rocky exoplanets, achieving efficient, spatially resolved predictions in low-data, high-dimensional settings for the first time.
📝 Abstract
In the sciences, regression tasks often require predicting high-dimensional outputs from few training examples. Multi-output Gaussian processes excel in low-data regimes but typically struggle with high-dimensional outputs. Compress-then-predict pipelines such as PCA-GP (principal component analysis plus Gaussian process regression) handle high dimensionality, but rely on bases optimized for reconstruction rather than prediction. To address this gap, we propose a model that represents each output as a linear-Gaussian decoding of a low-dimensional latent state drawn from a Gaussian process prior. By analytically marginalizing the decoder weights, we couple compression and prediction in a single objective that scales to high-dimensional outputs. We refer to this model as Gaussian process latent factor regression (GPLFR). We demonstrate GPLFR by building the first spatially resolved emulator of global climate models for rocky exoplanets.