SIVF: GPU-Resident IVF Index for Streaming Vector Analytics

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of traditional GPU-based inverted file (IVF) indexes, which employ static memory layouts that preclude in-place updates and incur frequent CPU-GPU data transfers with high latency in streaming scenarios. To overcome this limitation, the authors propose SIVF, the first GPU-native dynamic IVF index supporting efficient insertions and deletions. Key innovations include slab-based VRAM allocation, a validity bitmap, a GPU-resident address translation table (ATT), and a lock-free concurrency mechanism, collectively enabling O(1) vector location and in-place updates within GPU memory. Experiments on GIST1M demonstrate a 13,300× reduction in deletion latency (from 11.8 seconds to 0.89 ms) and 36–105× higher throughput. In sliding-window workloads, SIVF achieves end-to-end speedups of 161–266× while maintaining millisecond-level latency and incurring less than 0.8% memory overhead.

Technology Category

Application Category

📝 Abstract
GPU-accelerated Inverted File (IVF) index is one of the industry standards for large-scale vector analytics but relies on static VRAM layouts that hinder real-time mutability. Our benchmark and analysis reveal that existing designs of GPU IVF necessitate expensive CPU-GPU data transfers for index updates, causing system latency to spike from milliseconds to seconds in streaming scenarios. We present SIVF, a GPU-native index that enables high-velocity, in-place mutation via a series of new data structures and algorithms, such as conflict-free slab allocation and coalesced search on non-contiguous memory. SIVF has been implemented and integrated into the open-source vector search library, Faiss. Evaluation against baselines with diverse vector datasets demonstrates that SIVF reduces deletion latency by orders of magnitude compared to the baseline. Furthermore, distributed experiments on a 12-GPU cluster reveal that SIVF exhibits near perfect linear scalability, achieving an aggregate ingestion throughput of 4.07 million vectors/s and a deletion throughput of 108.5 million vectors/s.
Problem

Research questions and friction points this paper is trying to address.

vector search
inverted file index
streaming data
GPU memory
real-time updates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming Inverted File
GPU-resident index
in-place mutation
address translation table
vector database
🔎 Similar Papers
No similar papers found.