🤖 AI Summary
Gaussian processes face significant challenges in large-scale spatial prediction, including high computational costs, the trade-off between accuracy and efficiency in low-rank approximations, and sensitivity to contaminated data. This work proposes an ensemble approach based on partitioned predictive processes (PP), incorporating a multi-resolution overlapping partitioning strategy that effectively balances scalability and predictive accuracy under a fixed number of inducing points. Theoretical analysis establishes, for the first time, the asymptotic robustness of this PP framework. Empirical evaluations demonstrate that the proposed method substantially outperforms existing approaches in both synthetic and real-world large-scale geostatistical tasks, achieving superior efficiency, high prediction accuracy, and strong robustness against data contamination.
📝 Abstract
Gaussian processes provide a flexible framework for spatial prediction, but their computational cost limits applicability to large-scale data with large sample size $n$. Predictive processes (PPs), a popular low-rank approximation, mitigate this burden by projecting the original process onto a reduced set of $m\ll n$ inducing points. However, existing theory requires $m$ to grow with $n$, creating a trade-off between accuracy and computational efficiency. We address this challenge by introducing an ensemble of PPs based on spatial partitioning, and propose a novel partitioning and patching scheme with desirable properties. By generalizing the convergence results of PPs, it becomes possible to explicitly balance scalability and accuracy: increasing the number of ensemble components slows down the convergence but substantially improves computational efficiency. We further show theoretically that, despite the limited approximation accuracy of PPs with fixed $m$, they are asymptotically robust to data contamination. Motivated by this insight, we finally introduce a multi-resolution ensemble that combines PPs with fixed $m$ with multiple ensembles defined over possibly overlapping coarse to fine partitions. Simulations and large-scale geostatistical applications demonstrate that our approach delivers accurate, robust predictions with computational gains, providing a practical and broadly applicable solution for spatial prediction.