Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

πŸ“… 2026-05-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

172K/year
πŸ€– AI Summary
This work proposes Critic-R, a framework that enhances retrieval performance in multi-hop question answering by enabling synergistic optimization of retrieval and reasoning without requiring human annotations or joint training. The approach employs a critic model to evaluate the agent’s introspective reasoning trajectories grounded in retrieved evidence, assessing whether the current context sufficiently supports the next reasoning step. Based on this assessment, the framework dynamically rewrites queries and refines retrieval instructions. It encompasses three key components: zero-shot query rewriting (Critic-R-Zero), instruction-tuned retrieval, and embedding optimization guided by successful or failed reasoning trajectories (Critic-Embed). Evaluated on benchmark datasets including HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle, Critic-R demonstrates substantial improvements in both retrieval quality and answer accuracy.
πŸ“ Abstract
Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.
Problem

Research questions and friction points this paper is trying to address.

agentic search
retriever optimization
complex queries
feedback loop
relevance annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic search
instruction-tuned retrievers
introspective feedback
query refinement
automatic supervision