Gaussian Process Latent Factor Regression for Low-Data, High-Dimensional Output Problems

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of high-dimensional output regression under scarce training data, where conventional multi-output Gaussian processes (GPs) suffer from poor scalability and existing compression-prediction approaches—such as PCA-GP—rely on fixed, task-agnostic bases. To overcome these limitations, we propose Gaussian Process Latent Factor Regression (GPLFR), which models high-dimensional outputs as linear-Gaussian decodings of low-dimensional latent states governed by GP priors. By analytically marginalizing decoding weights, GPLFR enables end-to-end joint optimization of dimensionality reduction and prediction. Combining variational inference with scalable kernel methods, GPLFR substantially outperforms baseline methods like PCA-GP in global climate modeling of rocky exoplanets, achieving efficient, spatially resolved predictions in low-data, high-dimensional settings for the first time.

📝 Abstract

In the sciences, regression tasks often require predicting high-dimensional outputs from few training examples. Multi-output Gaussian processes excel in low-data regimes but typically struggle with high-dimensional outputs. Compress-then-predict pipelines such as PCA-GP (principal component analysis plus Gaussian process regression) handle high dimensionality, but rely on bases optimized for reconstruction rather than prediction. To address this gap, we propose a model that represents each output as a linear-Gaussian decoding of a low-dimensional latent state drawn from a Gaussian process prior. By analytically marginalizing the decoder weights, we couple compression and prediction in a single objective that scales to high-dimensional outputs. We refer to this model as Gaussian process latent factor regression (GPLFR). We demonstrate GPLFR by building the first spatially resolved emulator of global climate models for rocky exoplanets.

Problem

Research questions and friction points this paper is trying to address.

low-data

high-dimensional output

multi-output regression

Gaussian process

prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian process

latent factor

multi-output regression