🤖 AI Summary
This paper addresses the challenge of efficiently merging Hierarchical Navigable Small World (HNSW) graphs in distributed systems and incremental indexing. We propose the first systematic merge framework and design three algorithms—NGM, IGTM, and CGTM—that decompose merging into four iterative phases: vertex selection, candidate collection, neighborhood construction, and information propagation. Key contributions include: (i) the first formal problem definition of HNSW graph merging; (ii) the discovery that IGTM outperforms CGTM in efficiency—contrary to prior assumptions; and (iii) novel mechanisms for adaptive neighbor reconstruction, graph-structure-aware traversal, and multi-stage information propagation. Experiments on SIFT1M demonstrate up to 70% reduction in distance computations with zero accuracy loss, significantly enhancing real-time index merging and compression capabilities in large-scale vector databases.
📝 Abstract
This paper addresses the challenge of merging hierarchical navigable small world (HNSW) graphs, a critical operation for distributed systems, incremental indexing, and database compaction. We propose three algorithms for this task: Naive Graph Merge (NGM), Intra Graph Traversal Merge (IGTM), and Cross Graph Traversal Merge (CGTM). These algorithms differ in their approach to vertex selection and candidate collection during the merge process. We conceptualize graph merging as an iterative process with four key steps: processing vertex selection, candidate collection, neighborhood construction, and information propagation. Our experimental evaluation on the SIFT1M dataset demonstrates that IGTM and CGTM significantly reduce computational costs compared to naive approaches, requiring up to 70% fewer distance computations while maintaining comparable search accuracy. Surprisingly, IGTM outperforms CGTM in efficiency, contrary to our initial expectations. The proposed algorithms enable efficient consolidation of separately constructed indices, supporting critical operations in modern vector databases and retrieval systems that rely on HNSW for similarity search.