Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work proposes Critic-R, a framework that enhances retrieval performance in multi-hop question answering by enabling synergistic optimization of retrieval and reasoning without requiring human annotations or joint training. The approach employs a critic model to evaluate the agent’s introspective reasoning trajectories grounded in retrieved evidence, assessing whether the current context sufficiently supports the next reasoning step. Based on this assessment, the framework dynamically rewrites queries and refines retrieval instructions. It encompasses three key components: zero-shot query rewriting (Critic-R-Zero), instruction-tuned retrieval, and embedding optimization guided by successful or failed reasoning trajectories (Critic-Embed). Evaluated on benchmark datasets including HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle, Critic-R demonstrates substantial improvements in both retrieval quality and answer accuracy.

📝 Abstract

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.

Problem

Research questions and friction points this paper is trying to address.

agentic search

retriever optimization

complex queries

feedback loop

relevance annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic search

instruction-tuned retrievers

introspective feedback