🤖 AI Summary
Existing GPU-accelerated approximate nearest neighbor search (ANNS) methods on graphs exhibit poor scalability across multiple GPUs, relying solely on data sharding and independent per-GPU search without coordinated utilization of multi-GPU compute resources. This work introduces the first high-throughput multi-GPU graph ANNS framework. Its core contributions are: (1) a GPU-aware path expansion pipeline leveraging peer-to-peer (P2P) inter-GPU communication to enable iterative cross-GPU coordination; (2) a ghost caching mechanism that improves query initialization point selection; and (3) a direction-guided pruning technique that dynamically eliminates irrelevant nodes, reducing both computational and memory overhead. Evaluated on multiple standard benchmarks at 95% recall, the framework achieves a 3.24× geometric mean speedup over state-of-the-art multi-GPU ANNS systems, with peak acceleration reaching 5.30×.
📝 Abstract
Graph-based Approximate Nearest Neighbor Search (ANNS) is widely adopted in numerous applications, such as recommendation systems, natural language processing, and computer vision. While recent works on GPU-based acceleration have significantly advanced ANNS performance, the ever-growing scale of datasets now demands efficient multi-GPU solutions. However, the design of existing works overlooks multi-GPU scalability, resulting in naive approaches that treat additional GPUs as a means to extend memory capacity for large datasets. This inefficiency arises from partitioning the dataset and independently searching for data points similar to the queries in each GPU. We therefore propose PathWeaver, a novel multi-GPU framework designed to scale and accelerate ANNS for large datasets. First, we propose pipelining-based path extension, a GPU-aware pipelining mechanism that reduces prior work's redundant search iterations by leveraging GPU-to-GPU communication. Second, we design ghost staging that leverages a representative dataset to identify optimal query starting points, reducing the search space for challenging queries. Finally, we introduce direction-guided selection, a data selection technique that filters irrelevant points early in the search process, minimizing unnecessary memory accesses and distance computations. Comprehensive evaluations across diverse datasets demonstrate that PathWeaver achieves 3.24$ imes$ geomean speedup and up to 5.30$ imes$ speedup on 95% recall rate over state-of-the-art multi-GPU-based ANNS frameworks.