GPU-Resident Gaussian Process Regression Leveraging Asynchronous Tasks with HPX

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Gaussian Process Regression (GPR) struggles to scale to large datasets due to the cubic time complexity of its exact inference. This work proposes the first fully GPU-resident GPR prediction pipeline, integrating block-wise Cholesky decomposition with efficient GPU memory management and leveraging the HPX asynchronous task runtime alongside multiple CUDA streams for coordinated scheduling. The approach achieves substantial computational speedups while preserving numerical stability: on datasets with more than 128 samples, it accelerates Cholesky decomposition by 4.3× and overall prediction by 4.6×. In large-model scenarios, the method outperforms cuSOLVER by up to 11% in performance.

Technology Category

Application Category

📝 Abstract
Gaussian processes (GPs) are a widely used regression tool, but the cubic complexity of exact solvers limits their scalability. To address this challenge, we extend the GPRat library by incorporating a fully GPU-resident GP prediction pipeline. GPRat is an HPX-based library that combines task-based parallelism with an intuitive Python API. We implement tiled algorithms for the GP prediction using optimized CUDA libraries, thereby exploiting massive parallelism for linear algebra operations. We evaluate the optimal number of CUDA streams and compare the performance of our GPU implementation to the existing CPU-based implementation. Our results show the GPU implementation provides speedups for datasets larger than 128 training samples. We observe speedups of up to 4.3 for the Cholesky decomposition itself and 4.6 for the GP prediction. Furthermore, combining HPX with multiple CUDA streams allows GPRat to match, and for large datasets, surpass cuSOLVER's performance by up to 11 percent.
Problem

Research questions and friction points this paper is trying to address.

Gaussian Process Regression
cubic complexity
scalability
GPU acceleration
large-scale datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-resident
Gaussian Process Regression
HPX
asynchronous tasks
tiled algorithms
H
Henrik Möllmann
Institute of Parallel and Distributed Systems, University of Stuttgart, 70569 Stuttgart, Germany
Dirk Pflüger
Dirk Pflüger
University of Stuttgart
Scientific ComputingHigh-Performance ComputingHigh-Dimensional ApproximationNumerical Machine Learning
A
Alexander Strack
Institute of Parallel and Distributed Systems, University of Stuttgart, 70569 Stuttgart, Germany