🤖 AI Summary
This work addresses the high memory overhead and scalability limitations of large-scale approximate nearest neighbor search (ANNS) systems that rely on in-memory graph structures such as HNSW. The authors propose Helmsman, a clustering-based ANNS system that integrates a user-space I/O stack, a learning-driven hierarchical pruning mechanism, and a GPU-accelerated indexing pipeline to enable efficient billion-scale vector retrieval on all-flash storage architectures. Helmsman substantially overcomes the latency, construction speed, and resource efficiency bottlenecks of conventional clustering approaches: it reduces hardware costs by over 90%, enables billion-scale index reconstruction within hours, and in production deployment replaces a cluster requiring 35,000 CPU cores and 0.35 PB of memory with only 40 servers, maintaining stable operation for several months.
📝 Abstract
RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNSW) to provide high throughput and low latency.
However, the ever-growing user base and content volume have led to an explosive increase in memory footprint and consequently huge CapEx and OpEx. After exploring various alternatives, we find that building a clustering-based ANNS on top of all-flash servers can be promising. Yet, we still experience severe overheads from the kernel I/O stack, a fixed pruning strategy, and slow index construction.
We present HELMSMAN, a high-performance and cost-effective clustering-based ANNS system, which combines an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated pipelines of construction. HELMSMAN saves over 90% of hardware costs and enables billion-scale index (re)builds within hours. In the current production deployment, operating stably for several months, 40 machines now host ANNS workloads that previously required about 35,000 cores and 0.35 PB DRAM.