🤖 AI Summary
This work addresses the challenge of explicitly modeling answer reasoning chains in multi-hop retrieval-augmented generation (RAG) systems under a fixed retrieval budget. The authors propose HKVM-RAG, which introduces a novel key-value separated hypergraph evidence organization mechanism: evidence tuples generated by a large language model form hyperedges serving as retrieval keys, while original passages act as answer values. Rather than replacing dense retrieval, the hypergraph functions as a reusable control signal. The approach integrates weighted hypergraph retrieval, a fixed-basis protocol, and a dense-aware controller fusing ColBERTv2 with HKVM features. Extensive ablation studies at the source level validate its effectiveness. On 2WikiMultiHopQA, MuSiQue, and HotpotQA, HKVM-RAG achieves F1 scores of 88.846, 65.073, and 85.810, respectively—improving upon ColBERTv2 by 11.084, 6.763, and 5.966 points.
📝 Abstract
Multi-hop RAG poses a data-engineering problem beyond passage matching: under fixed retrieval budgets, a system must organize retrieved text into evidence units that expose answer chains. Dense retrievers score passages independently, while graph-based memories make associations explicit but often rely on pairwise or entity-centered keys that fragment multi-hop evidence. We present HKVM-RAG, a key-value-separated evidence-organization layer. It assembles answer-path hyperedges from cached passage-level LLM evidence tuples and uses them as retrieval keys, while retaining passage text as answer values. To isolate key-space design, our fixed-substrate protocol holds the tuple cache, candidate passages, reader, and evaluation budget constant across pairwise graph and hypergraph variants. Weighted hypergraph key-value retrieval improves over KG-PPR by +3.426 F1 on 2WikiMultiHopQA and +3.592 F1 on MuSiQue; HotpotQA shows that higher structured support coverage need not yield standalone answer-F1 gains. We therefore study WHG-KV as an evidence-control signal rather than a dense-retrieval replacement. Oracle and train-to-dev analyses identify support selection as repairable, and a dense-aware controller combines frozen ColBERTv2 and HKVM rank/score features using out-of-fold HKVM predictions. It reaches 88.846, 65.073, and 85.810 F1 on the three benchmarks, improving over ColBERTv2 by +11.084, +6.763, and +5.966 F1. Source-level ablations show that matched non-WHG structured signals do not match the WHG-KV gains. These results provide bounded evidence that key-value-separated hypergraph organization can serve as a reusable evidence-control mechanism for multi-hop RAG.