Three Algorithms for Merging Hierarchical Navigable Small World Graphs

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This paper addresses the challenge of efficiently merging Hierarchical Navigable Small World (HNSW) graphs in distributed systems and incremental indexing. We propose the first systematic merge framework and design three algorithms—NGM, IGTM, and CGTM—that decompose merging into four iterative phases: vertex selection, candidate collection, neighborhood construction, and information propagation. Key contributions include: (i) the first formal problem definition of HNSW graph merging; (ii) the discovery that IGTM outperforms CGTM in efficiency—contrary to prior assumptions; and (iii) novel mechanisms for adaptive neighbor reconstruction, graph-structure-aware traversal, and multi-stage information propagation. Experiments on SIFT1M demonstrate up to 70% reduction in distance computations with zero accuracy loss, significantly enhancing real-time index merging and compression capabilities in large-scale vector databases.

Technology Category

Application Category

📝 Abstract

This paper addresses the challenge of merging hierarchical navigable small world (HNSW) graphs, a critical operation for distributed systems, incremental indexing, and database compaction. We propose three algorithms for this task: Naive Graph Merge (NGM), Intra Graph Traversal Merge (IGTM), and Cross Graph Traversal Merge (CGTM). These algorithms differ in their approach to vertex selection and candidate collection during the merge process. We conceptualize graph merging as an iterative process with four key steps: processing vertex selection, candidate collection, neighborhood construction, and information propagation. Our experimental evaluation on the SIFT1M dataset demonstrates that IGTM and CGTM significantly reduce computational costs compared to naive approaches, requiring up to 70% fewer distance computations while maintaining comparable search accuracy. Surprisingly, IGTM outperforms CGTM in efficiency, contrary to our initial expectations. The proposed algorithms enable efficient consolidation of separately constructed indices, supporting critical operations in modern vector databases and retrieval systems that rely on HNSW for similarity search.

Problem

Research questions and friction points this paper is trying to address.

Merging HNSW graphs for distributed systems and indexing

Reducing computational costs in graph merging algorithms

Efficient consolidation of indices for vector databases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three algorithms for merging HNSW graphs

IGTM and CGTM reduce distance computations

Efficient consolidation of separate indices

🔎 Similar Papers

When Does Bottom-up Beat Top-down in Hierarchical Community Detection?

2023-06-01Journal of the American Statistical AssociationCitations: 2

ByteDance

圣何塞

Research Engineer / Scientist -AI for Databases

ByteDance

西雅图

Research Scientist