🤖 AI Summary
This work addresses the challenge of redundant retrieval results and low signal-to-noise ratios in multi-turn search, which can trap agents in “tunnel vision” and lead to irreversible error accumulation. To mitigate this, the authors propose a robust exploration framework that operates without external verifiers, integrating self-evidence support (SES), dynamic prompt intervention guided by information gain scoring, a diversification branching mechanism, and group-relative policy optimization (GRPO). By dynamically adjusting prompts based on quantified information gain, the framework enhances reasoning diversity and suppresses redundancy. Evaluated on both single-hop and multi-hop question answering benchmarks, the method significantly outperforms existing approaches, achieving higher accuracy with fewer search steps—particularly in complex reasoning tasks.
📝 Abstract
Reinforcement Learning (RL) has empowered Large Language Models (LLMs) to master autonomous search for complex question answering. However, particularly within multi-turn search scenarios, this interaction introduces a critical challenge: search results often suffer from high redundancy and low signal-to-noise ratios. Consequently, agents easily fall into"Tunnel Vision,"where the forced interpretation of early noisy retrievals leads to irreversible error accumulation. To address these challenges, we propose SIGHT, a framework that enhances search-based reasoning through Self-Evidence Support (SES) and Information-Gain Driven Diverse Branching. SIGHT distills search results into high-fidelity evidence via SES and calculates an Information Gain score to pinpoint pivotal states where observations maximally reduce uncertainty. This score guides Dynamic Prompting Interventions - including de-duplication, reflection, or adaptive branching - to spawn new branches with SES. Finally, by integrating SES and correctness rewards via Group Relative Policy Optimization, SIGHT internalizes robust exploration strategies without external verifiers. Experiments on single-hop and multi-hop QA benchmarks demonstrate that SIGHT significantly outperforms existing approaches, particularly in complex reasoning scenarios, using fewer search steps.