π€ AI Summary
Standard inference for multi-output Gaussian processes (MOGPs) suffers from prohibitive computational complexity in computing posterior means, especially for large-scale spatiotemporal data.
Method: This paper proposes an efficient inference framework tailored to separable spatiotemporal covariance structures. It reformulates noisy MOGP inference as solving a large-scale Stein equation and introduces the first integration of low-rank preconditioned conjugate gradient (LRPCG) with a Kronecker-product-based iterative solver (KPIK) specifically designed for Stein equations. Additionally, a degree-weighted average covariance matrix is incorporated to accelerate convergence. The method synergistically combines low-rank approximation, Kronecker factorization, and graph-filtering-based covariance modeling.
Results: Evaluated on real-world road network data, the approach significantly reduces memory footprint and accelerates posterior mean computation by an order of magnitude, enabling scalable, real-time MOGP inferenceβa novel paradigm for large-scale multi-output spatial statistics.
π Abstract
Gaussian processes (GP) are a versatile tool in machine learning and computational science. We here consider the case of multi-output Gaussian processes (MOGP) and present low-rank approaches for efficiently computing the posterior mean of a MOGP. Starting from low-rank spatio-temporal data we consider a structured covariance function, assuming separability across space and time. This separability, in turn, gives a decomposition of the covariance matrix into a Kronecker product of individual covariance matrices. Incorporating the typical noise term to the model then requires the solution of a large-scale Stein equation for computing the posterior mean. For this, we propose efficient low-rank methods based on a combination of a LRPCG method with the Sylvester equation solver KPIK adjusted for solving Stein equations. We test the developed method on real world street network graphs by using graph filters as covariance matrices. Moreover, we propose a degree-weighted average covariance matrix, which can be employed under specific assumptions to achieve more efficient convergence.