🤖 AI Summary
Memory layout effects on GPU-accelerated graph-based Approximate Nearest Neighbor Search (ANNS) have been largely overlooked, despite their significant impact on memory access efficiency. Method: This paper systematically uncovers the intrinsic relationship between graph structural properties and GPU memory access patterns. We propose the first unified evaluation framework for GPU graph ANNS, comprising a graph adapter for standardized heterogeneous graph representation, a GPU-optimized traversal engine, and multiple graph reordering strategies—accompanied by quantitative metrics to assess memory layout efficacy. Our reordering methods are orthogonal to existing graph indices and require no modification to underlying algorithms. Contribution/Results: Extensive experiments across multiple datasets and state-of-the-art graph indices (e.g., HNSW, NSG) demonstrate up to 15% higher query throughput with zero accuracy loss, empirically validating that memory layout optimization is critical for accelerating graph ANNS on GPUs.
📝 Abstract
We present the first systematic investigation of graph reordering effects for graph-based Approximate Nearest Neighbor Search (ANNS) on a GPU. While graph-based ANNS has become the dominant paradigm for modern AI applications, recent approaches focus on algorithmic innovations while neglecting memory layout considerations that significantly affect execution time. Our unified evaluation framework enables comprehensive evaluation of diverse reordering strategies across different graph indices through a graph adapter that converts arbitrary graph topologies into a common representation and a GPU-optimized graph traversal engine. We conduct a comprehensive analysis across diverse datasets and state-of-the-art graph indices, introducing analysis metrics that quantify the relationship between structural properties and memory layout effectiveness. Our GPU-targeted reordering achieves up to 15$%$ QPS improvements while preserving search accuracy, demonstrating that memory layout optimization operates orthogonally to existing algorithmic innovations. We will release all code upon publication to facilitate reproducibility and foster further research.