PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the challenge that large language models struggle to effectively integrate cross-document evidence in multi-hop question answering. To overcome this limitation, the authors propose PAR²-RAG, a two-stage framework that first constructs a high-recall evidence frontier through breadth-first retrieval and then iteratively refines this evidence via depth-first reasoning while dynamically assessing sufficiency. By decoupling retrieval coverage from reasoning decisions, the approach simultaneously achieves high recall and adaptive inference, thereby avoiding premature commitment to low-recall retrieval paths and mitigating the drawbacks of static query formulations. Evaluated on four multi-hop QA benchmarks, PAR²-RAG substantially outperforms existing methods, yielding up to a 23.5% absolute improvement in accuracy over IRCoT and a 10.5% gain in NDCG retrieval metrics.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) remain brittle on multi-hop question answering (MHQA), where answering requires combining evidence across documents through retrieval and reasoning. Iterative retrieval systems can fail by locking onto an early low-recall trajectory and amplifying downstream errors, while planning-only approaches may produce static query sets that cannot adapt when intermediate evidence changes. We propose \textbf{Planned Active Retrieval and Reasoning RAG (PAR$^2$-RAG)}, a two-stage framework that separates \emph{coverage} from \emph{commitment}. PAR$^2$-RAG first performs breadth-first anchoring to build a high-recall evidence frontier, then applies depth-first refinement with evidence sufficiency control in an iterative loop. Across four MHQA benchmarks, PAR$^2$-RAG consistently outperforms existing state-of-the-art baselines, compared with IRCoT, PAR$^2$-RAG achieves up to \textbf{23.5\%} higher accuracy, with retrieval gains of up to \textbf{10.5\%} in NDCG.

Problem

Research questions and friction points this paper is trying to address.

multi-hop question answering

retrieval

reasoning

large language models

evidence combination

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-hop question answering

active retrieval

evidence frontier