🤖 AI Summary
For kernel matrix-vector multiplication (KMVM) on billion-scale (10⁹), low-dimensional (D ≤ 7) datasets, conventional methods suffer from prohibitive O(N²) memory and time complexity. This work proposes F3M (Faster, Fast, and Free Memory Method)—the first GPU-accelerated KMVM algorithm empirically achieving linear time and memory complexity while maintaining approximation error below 10⁻³. F3M integrates three core techniques: GPU-optimized approximate kernel evaluation, synergistic low-rank decomposition and random projection for compression, and memory-aware tiling scheduling. On high-end GPUs, F3M computes billion-point KMVM in under 60 seconds. When integrated into FALKON, it delivers 1.5–5.5× speedup with <1% accuracy degradation. The method significantly enhances the scalability and practicality of kernel-based learning—particularly Gaussian process regression—for massive datasets.
📝 Abstract
Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined extit{Faster-Fast and Free Memory Method} ($fthreem$) to address these scaling issues of KMVM for tall~($10^8sim 10^9$) and skinny~($Dleq7$) data. Extensive experiments demonstrate that $fthreem$ has empirical emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, emph{improving speed 1.5-5.5 times} at the cost of $<1%$ drop in accuracy. We further demonstrate competitive results on emph{Gaussian Process regression} coupled with significant speedups on a variety of real-world datasets.