🤖 AI Summary
This work addresses the I/O latency bottleneck in disk-based approximate nearest neighbor (ANN) search by proposing an I/O-aware speculative search strategy that jointly optimizes CPU computation and I/O scheduling. The approach dynamically balances I/O reduction against timely I/O initiation across different query phases. Its key innovation lies in the first integration of in-memory candidate set computation into I/O decision-making, enabling deep co-design between computation and I/O. This is realized through three synergistic techniques: speculative search, a priority-driven I/O–CPU pipeline, and a lightweight in-memory graph index. Evaluated on datasets ranging from millions to billions of vectors, the system achieves 1.41–4.66× higher throughput, 29%–79% lower latency, and 1.59–6.34× fewer I/O operations compared to the state-of-the-art baselines.
📝 Abstract
Approximate nearest neighbor search (ANNS) is a fundamental primitive in large-scale retrieval, recommendation, and AI systems. As vector datasets grow to billions or even trillions of items, disk-based ANNS systems have emerged to handle this scale by storing vector data and index structures on storage systems, but their query performance remains dominated by I/O latency. Existing disk-based ANNS systems primarily optimize I/O efficiency or overlap I/O with computation, but they treat CPU computation and I/O access as largely separate components. This separation misses a critical opportunity: selectively processing candidates already cached in memory before making I/O decisions can reduce unnecessary disk accesses and improve search quality. However, exploiting this opportunity is challenging because excessive computation can delay critical I/O operations, while poorly chosen computation provides little benefit, potentially increasing overall query latency.
In this paper, we present LAANN, a disk-based ANNS system that makes graph search explicitly I/O-aware by co-optimizing CPU computation and I/O access. LAANN combines three techniques: look-ahead search, which adapts the search strategy across query stages to balance I/O reduction and timely I/O issuance; a priority I/O-CPU pipeline, which uses I/O waiting time to process candidates cached in memory according to their expected impact on upcoming I/O decisions; and a fast lightweight in-memory graph index, which provides high-quality initial candidates to accelerate convergence and reduce disk accesses. Experiments on million- and billion-scale datasets demonstrate that LAANN substantially outperforms state-of-the-art disk-based ANNS systems. At Recall@10 = 0.9, LAANN achieves 1.41x-4.66x higher throughput, 29%-79% lower latency, and 1.59x-6.34x fewer I/O operations.