🤖 AI Summary
To address key challenges in hybrid retrieval-augmented generation (RAG)—including difficulty in multi-hop reasoning, weak multi-entity association, low credibility of multi-source evidence, and underutilization of knowledge graphs—this paper proposes Hydra, a training-free framework. Methodologically, Hydra introduces: (1) a novel three-factor cross-source verification mechanism—comprising source credibility assessment, cross-source mutual corroboration, and entity path alignment; (2) graph-structured guidance for agent-based hybrid retrieval with early noise pruning; and (3) integrated knowledge graph topological modeling, heterogeneous semantic alignment, and cross-modal consistency verification. Evaluated across seven benchmarks, Hydra consistently outperforms state-of-the-art methods: it achieves an average 20.3% improvement over GPT-3.5 (up to +30.1%) and matches GPT-4-Turbo-level reasoning performance using only Llama-3.1-8B.
📝 Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. Current hybrid RAG system retrieves evidence from both knowledge graphs (KGs) and text documents to support LLM reasoning. However, it faces challenges like handling multi-hop reasoning, multi-entity questions, multi-source verification, and effective graph utilization. To address these limitations, we present Hydra, a training-free framework that unifies graph topology, document semantics, and source reliability to support deep, faithful reasoning in LLMs. Hydra handles multi-hop and multi-entity problems through agent-driven exploration that combines structured and unstructured retrieval, increasing both diversity and precision of evidence. To tackle multi-source verification, Hydra uses a tri-factor cross-source verification (source trustworthiness assessment, cross-source corroboration, and entity-path alignment), to balance topic relevance with cross-modal agreement. By leveraging graph structure, Hydra fuses heterogeneous sources, guides efficient exploration, and prunes noise early. Comprehensive experiments on seven benchmark datasets show that Hydra achieves overall state-of-the-art results on all benchmarks with GPT-3.5, outperforming the strong hybrid baseline ToG-2 by an average of 20.3% and up to 30.1%. Furthermore, Hydra enables smaller models (e.g., Llama-3.1-8B) to achieve reasoning performance comparable to that of GPT-4-Turbo.