🤖 AI Summary
To address the inefficiency of attribute-filtered approximate nearest neighbor search (filtered-ANNS) on GPUs, this paper introduces the first high-performance GPU-accelerated vector filtering retrieval system. Methodologically, it features: (1) a label-centric indexing and search algorithm that significantly improves filter selectivity; (2) an architecture-aware modular design supporting both small and large query batches, as well as single- and multi-label queries uniformly; and (3) a holistic optimization integrating GPU parallelism, label-aware indexing, hybrid batch scheduling, and customized CUDA kernels. Evaluated on an NVIDIA A100 GPU, the system achieves 5 million queries per second (QPS) at 90% recall—135× faster than the CPU-based Filter-DiskANN—and attains >99% recall, substantially surpassing the ~80% recall ceiling of prior GPU-based approaches.
📝 Abstract
Vector search and database systems have become a keystone component in many AI applications. While many prior research has investigated how to accelerate the performance of generic vector search, emerging AI applications require running more sophisticated vector queries efficiently, such as vector search with attribute filters. Unfortunately, recent filtered-ANNS solutions are primarily designed for CPUs, with few exploration and limited performance of filtered-ANNS that take advantage of the massive parallelism offered by GPUs. In this paper, we present VecFlow, a novel high-performance vector filtered search system that achieves unprecedented high throughput and recall while obtaining low latency for filtered-ANNS on GPUs. We propose a novel label-centric indexing and search algorithm that significantly improves the selectivity of ANNS with filters. In addition to algorithmic level optimization, we provide architectural-aware optimization for VecFlow's functional modules, effectively supporting both small batch and large batch queries, and single-label and multi-label query processing. Experimental results on NVIDIA A100 GPU over several public available datasets validate that VecFlow achieves 5 million QPS for recall 90%, outperforming state-of-the-art CPU-based solutions such as Filtered-DiskANN by up to 135 times. Alternatively, VecFlow can easily extend its support to high recall 99% regime, whereas strong GPU-based baselines plateau at around 80% recall. The source code is available at https://github.com/Supercomputing-System-AI-Lab/VecFlow.