VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of attribute-filtered approximate nearest neighbor search (filtered-ANNS) on GPUs, this paper introduces the first high-performance GPU-accelerated vector filtering retrieval system. Methodologically, it features: (1) a label-centric indexing and search algorithm that significantly improves filter selectivity; (2) an architecture-aware modular design supporting both small and large query batches, as well as single- and multi-label queries uniformly; and (3) a holistic optimization integrating GPU parallelism, label-aware indexing, hybrid batch scheduling, and customized CUDA kernels. Evaluated on an NVIDIA A100 GPU, the system achieves 5 million queries per second (QPS) at 90% recall—135× faster than the CPU-based Filter-DiskANN—and attains >99% recall, substantially surpassing the ~80% recall ceiling of prior GPU-based approaches.

Technology Category

Application Category

📝 Abstract
Vector search and database systems have become a keystone component in many AI applications. While many prior research has investigated how to accelerate the performance of generic vector search, emerging AI applications require running more sophisticated vector queries efficiently, such as vector search with attribute filters. Unfortunately, recent filtered-ANNS solutions are primarily designed for CPUs, with few exploration and limited performance of filtered-ANNS that take advantage of the massive parallelism offered by GPUs. In this paper, we present VecFlow, a novel high-performance vector filtered search system that achieves unprecedented high throughput and recall while obtaining low latency for filtered-ANNS on GPUs. We propose a novel label-centric indexing and search algorithm that significantly improves the selectivity of ANNS with filters. In addition to algorithmic level optimization, we provide architectural-aware optimization for VecFlow's functional modules, effectively supporting both small batch and large batch queries, and single-label and multi-label query processing. Experimental results on NVIDIA A100 GPU over several public available datasets validate that VecFlow achieves 5 million QPS for recall 90%, outperforming state-of-the-art CPU-based solutions such as Filtered-DiskANN by up to 135 times. Alternatively, VecFlow can easily extend its support to high recall 99% regime, whereas strong GPU-based baselines plateau at around 80% recall. The source code is available at https://github.com/Supercomputing-System-AI-Lab/VecFlow.
Problem

Research questions and friction points this paper is trying to address.

Efficient GPU-based filtered vector search for AI applications
Improving selectivity in ANNS with attribute filters
Achieving high throughput and recall with low latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-centric indexing for filtered-ANNS on GPUs
Architectural-aware optimization for varied query types
High-throughput vector search with attribute filters
🔎 Similar Papers
No similar papers found.
J
Jingyi Xi
SSAIL Lab, UIUC
Chenghao Mo
Chenghao Mo
MSCS student at UIUC
B
Benjamin Karsin
NVIDIA
A
Artem Chirkin
NVIDIA
M
Mingqin Li
Microsoft
Minjia Zhang
Minjia Zhang
University of Illinois at Urbana-Champagin
ParallelismMachine Learning SystemsModel CompressionLLM Application