🤖 AI Summary
Existing vector databases lack relational reasoning capabilities, while graph databases cannot support efficient high-dimensional vector retrieval—limiting their effectiveness for hybrid multimodal queries. To address this, we propose HMGI, a unified indexing framework that jointly optimizes semantic similarity search and complex graph traversal. HMGI introduces two key innovations: (1) a modality-aware embedding sharding strategy that preserves cross-modal semantic relationships, and (2) a low-overhead adaptive index update mechanism enabling efficient maintenance under dynamic multimodal data. Built upon approximate nearest neighbor search integrated with a native graph database architecture—and inspired by TigerGraph/TigerVector—HMGI adopts a lightweight, transaction-aware update paradigm. Experimental results demonstrate that, on relation-intensive hybrid queries, HMGI significantly outperforms pure vector databases (e.g., Milvus), achieving sublinear query time complexity and substantial gains in end-to-end retrieval accuracy and latency.
📝 Abstract
The proliferation of complex, multimodal datasets has exposed a critical gap between the capabilities of specialized vector databases and traditional graph databases. While vector databases excel at semantic similarity search, they lack the capacity for deep relational querying. Conversely, graph databases master complex traversals but are not natively optimized for high-dimensional vector search. This paper introduces the Hybrid Multimodal Graph Index (HMGI), a novel framework designed to bridge this gap by creating a unified system for efficient, hybrid queries on multimodal data. HMGI leverages the native graph database architecture and integrated vector search capabilities, exemplified by platforms like Neo4j, to combine Approximate Nearest Neighbor Search (ANNS) with expressive graph traversal queries. Key innovations of the HMGI framework include modality-aware partitioning of embeddings to optimize index structure and query performance, and a system for adaptive, low-overhead index updates to support dynamic data ingestion, drawing inspiration from the architectural principles of systems like TigerVector. By integrating semantic similarity search directly with relational context, HMGI aims to outperform pure vector databases like Milvus in complex, relationship-heavy query scenarios and achieve sub-linear query times for hybrid tasks.