🤖 AI Summary
This work addresses the challenge of allocating computational resources in deep search agents, where a trade-off must be struck between retrieval accuracy and reasoning cost. The authors propose an Effective Token Cost (ETC) metric to systematically evaluate resource allocation strategies involving reranking and reasoning depth. They develop an end-to-end deep search framework that integrates listwise reranking with multi-scale model inference and validate it on the BrowseComp-Plus benchmark. Their findings demonstrate that moderate reranking—compared to increasing reasoning depth—achieves comparable accuracy while substantially reducing total token consumption, highlighting its effectiveness in enhancing the efficiency of deep search systems.
📝 Abstract
Deep research agents rely on iterative retrieval and reasoning to answer complex queries, but scaling test-time computation raises significant efficiency concerns. We study how to allocate reasoning budget in deep search pipelines, focusing on the role of listwise reranking. Using the BrowseComp-Plus benchmark, we analyze tradeoffs between model scale, reasoning effort, reranking depth, and total token cost via a novel effective token cost (ETC) metric. Our results show that reranking consistently improves retrieval and end-to-end accuracy, and that moderate reranking often yields larger gains than increasing search-time reasoning, achieving comparable accuracy at substantially lower cost. All our code is available at https://github.com/texttron/BrowseComp-Plus.git