ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the interruption of search cycles in LLM-based web agents during complex, knowledge-intensive queries—caused by context window limitations—this paper proposes ReSum, a novel reasoning paradigm enabling infinite-step, history-aware exploration via periodic context summarization. ReSum uniquely integrates lightweight context compression with explicit reasoning-state preservation. We further introduce the ReSum-GRPO training framework, which unifies summary generation, segmented trajectory training, and advantage broadcasting, and employs GRPO—a reinforcement learning algorithm—to optimize summary-driven policies. Evaluated on three benchmarks, ReSum achieves an average 4.5% absolute improvement over ReAct, with a maximum gain of 8.2%. Notably, WebResummer-30B—trained on only 1K samples—attains Pass@1 scores of 33.3% (zh) and 18.3% (en) on BrowseComp, significantly outperforming existing open-source models.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching complete solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents of varying scales across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5% over ReAct, with further gains of up to 8.2% following ReSum-GRPO training. Notably, with only 1K training samples, our WebResummer-30B (a ReSum-GRPO-trained version of WebSailor-30B) achieves 33.3% Pass@1 on BrowseComp-zh and 18.3% on BrowseComp-en, surpassing existing open-source web agents.
Problem

Research questions and friction points this paper is trying to address.

Overcoming context window limitations in LLM-based web agents
Enabling indefinite exploration through periodic context summarization
Improving performance on complex multi-entity search tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Periodic context summarization for indefinite exploration
ReSum-GRPO with segmented trajectory training
Compact reasoning states bypass context constraints
🔎 Similar Papers
No similar papers found.