Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to support multi-turn search agents in performing deep reasoning over extended contextual horizons. This paper introduces DeepMiner-32B, a novel agent architecture addressing this limitation through three key innovations: (1) reverse construction of high-difficulty, verifiable question-answer pairs to enhance reasoning-oriented training; (2) a dynamic sliding context window mechanism that eliminates reliance on external summarization models, enabling up to ~100 consecutive interaction turns; and (3) an end-to-end reinforcement learning framework built upon Qwen3-32B, which directly synthesizes training samples via inverse generation from real-world web data. Evaluated on search-agent benchmarks including BrowseComp-en, DeepMiner-32B achieves 33.5% accuracy—nearly 20 percentage points higher than the previous best open-source system—thereby significantly advancing the state of long-horizon, interactive reasoning agents.

Technology Category

Application Category

📝 Abstract
While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents with long-horizon interactions. We propose DeepMiner, a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window. DeepMiner presents a reverse construction method to generate complex but verifiable question-answer pairs from authentic web sources, which ensures the challenge and reliability of training data while injecting cognitive capabilities into multi-turn reasoning scenarios. We further design an elegant yet effective dynamic context management strategy for both training and inference, utilizing sliding window mechanisms while eliminating the dependency on external summarization models, thereby efficiently empowering the model to handle continuously expanding long-horizon contexts. Through reinforcement learning on Qwen3-32B, we develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks. DeepMiner attains 33.5% accuracy on BrowseComp-en, surpassing the previous best open-source agent by almost 20 percentage points, and demonstrates consistent improvements on BrowseComp-zh, XBench-DeepSearch, and GAIA. Notably, our dynamic context management enables sustained interactions of nearly 100 turns within standard 32k context length, effectively addressing the context limitations that constrain existing multi-turn interaction systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing deep reasoning in multi-turn agents with long interactions
Generating verifiable complex training data from authentic web sources
Managing expanding contexts efficiently without external summarization models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic context window management for long interactions
Reverse construction method for verifiable training data
Sliding window mechanism without external summarization models
🔎 Similar Papers
No similar papers found.
Qiaoyu Tang
Qiaoyu Tang
Institute of Software, Chinese Academy of Sciences
Natural Language Processing
Hao Xiang
Hao Xiang
Waymo Research, UCLA
L
Le Yu
Alibaba Group
Bowen Yu
Bowen Yu
Qwen Team, Alibaba Group
Post-trainingFoundation Model
Yaojie Lu
Yaojie Lu
Institute of Software, Chinese Academy of Sciences
Information ExtractionLarge Language Models
X
Xianpei Han
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Le Sun
Le Sun
Institute of Software, CAS
information_retrievalnatural_language_processing
W
WenJuan Zhang
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
P
Pengbo Wang
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Shixuan Liu
Shixuan Liu
National University of Defense Technology
Knowledge ReasoningDomain GeneralizationCausal InferenceData Engineering
Zhenru Zhang
Zhenru Zhang
Qwen Team, Alibaba Group
Large Language Model
J
Jianhong Tu
Alibaba Group
H
Hongyu Lin
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Junyang Lin
Junyang Lin
Qwen Team, Alibaba Group & Peking University
Natural Language ProcessingCross-Modal Representation LearningPretraining