LAANN: I/O-Aware Look-Ahead Search for Disk-Based Approximate Nearest Neighbor Search

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the I/O latency bottleneck in disk-based approximate nearest neighbor (ANN) search by proposing an I/O-aware speculative search strategy that jointly optimizes CPU computation and I/O scheduling. The approach dynamically balances I/O reduction against timely I/O initiation across different query phases. Its key innovation lies in the first integration of in-memory candidate set computation into I/O decision-making, enabling deep co-design between computation and I/O. This is realized through three synergistic techniques: speculative search, a priority-driven I/O–CPU pipeline, and a lightweight in-memory graph index. Evaluated on datasets ranging from millions to billions of vectors, the system achieves 1.41–4.66× higher throughput, 29%–79% lower latency, and 1.59–6.34× fewer I/O operations compared to the state-of-the-art baselines.

📝 Abstract

Approximate nearest neighbor search (ANNS) is a fundamental primitive in large-scale retrieval, recommendation, and AI systems. As vector datasets grow to billions or even trillions of items, disk-based ANNS systems have emerged to handle this scale by storing vector data and index structures on storage systems, but their query performance remains dominated by I/O latency. Existing disk-based ANNS systems primarily optimize I/O efficiency or overlap I/O with computation, but they treat CPU computation and I/O access as largely separate components. This separation misses a critical opportunity: selectively processing candidates already cached in memory before making I/O decisions can reduce unnecessary disk accesses and improve search quality. However, exploiting this opportunity is challenging because excessive computation can delay critical I/O operations, while poorly chosen computation provides little benefit, potentially increasing overall query latency. In this paper, we present LAANN, a disk-based ANNS system that makes graph search explicitly I/O-aware by co-optimizing CPU computation and I/O access. LAANN combines three techniques: look-ahead search, which adapts the search strategy across query stages to balance I/O reduction and timely I/O issuance; a priority I/O-CPU pipeline, which uses I/O waiting time to process candidates cached in memory according to their expected impact on upcoming I/O decisions; and a fast lightweight in-memory graph index, which provides high-quality initial candidates to accelerate convergence and reduce disk accesses. Experiments on million- and billion-scale datasets demonstrate that LAANN substantially outperforms state-of-the-art disk-based ANNS systems. At Recall@10 = 0.9, LAANN achieves 1.41x-4.66x higher throughput, 29%-79% lower latency, and 1.59x-6.34x fewer I/O operations.

Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor Search

Disk-based ANNS

I/O Latency

CPU-I/O Co-optimization

Look-Ahead Search

Innovation

Methods, ideas, or system contributions that make the work stand out.

I/O-aware search

look-ahead search

disk-based ANNS