🤖 AI Summary
This work addresses the challenge of effectively navigating multiple uncertain yet plausible reasoning paths in deep search scenarios involving multi-step retrieval and inference. To this end, the authors propose TreeSeeker, a framework that adopts a tree-based branching-and-backtracking search paradigm. TreeSeeker dynamically evaluates the value, uncertainty, and risk of each branch using a textualized UCB (Upper Confidence Bound) signal and incorporates a TreeMem mechanism to store branch-level evidence and failure cues, thereby enabling controlled exploration, exploitation, and pruning. Experimental results demonstrate that TreeSeeker significantly outperforms existing open-source baselines on the XBench-DeepSearch, BrowseComp, and BrowseComp-ZH benchmarks, confirming the efficacy of explicit branch-level control in enhancing deep search performance.
📝 Abstract
Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-looking direction, it may keep extending a weak continuation. If it explores without discipline, it may waste budget on disconnected trials. We propose TreeSeeker, an inference-time framework for controlled trial-and-error in deep search. TreeSeeker organizes search as branch-and-return search over tree-structured states, where each branch is a tentative direction for a sub-goal. At each round, TreeSearch reads all sub-goal trees, identifies active goals, and uses textual UCB signals of value, uncertainty, and risk to select among exploiting a promising branch, exploring an uncertain alternative, or pruning an unproductive continuation and returning to an earlier branch point. TreeMem supports this control loop by keeping evidence, uncertainty, conflicts, progress, and failure cues attached to the branches that produced them, so trial outcomes can guide later decisions. Experiments on XBench-DeepSearch, BrowseComp, and BrowseComp-ZH show that TreeSeeker consistently outperforms strong open-source baselines, suggesting that explicit branch-and-return control complements stronger reasoning and tool execution.