Learning Agent-Compatible Context Management for Long-Horizon Tasks

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the degradation of reasoning performance in large language model (LLM) agents during long-horizon tasks, which arises from uncontrolled context accumulation. Existing context management approaches typically require fine-tuning the agent itself, limiting their applicability to closed-source or heterogeneous systems. To overcome this, the paper proposes Adaptive Context Management (AdaCoM), a novel framework that employs an external LLM to dynamically prune and retain context for a frozen agent, trained end-to-end via reinforcement learning. AdaCoM establishes the first general-purpose context management mechanism that operates without modifying the agent, revealing a fundamental trade-off between context fidelity and reasoning reliability. It further enables policy transfer across agents with comparable capabilities. Experiments demonstrate that AdaCoM significantly enhances performance across diverse agents on web search and deep research benchmarks, exhibiting strong generalization and reusability.

📝 Abstract

LLM agents increasingly face long-horizon tasks such as web search and deep research in real-world applications, where accumulated context can cause long-context degradation and reasoning failures. Prior work mitigates this through context management with agent-side context control or fixed strategies such as summarization, which require training the agent itself for adaptation - making it impractical for closed-source agents and ignoring that different agents may require different strategies. We introduce Adaptive Context Management (AdaCoM), which trains an external LLM to manage the context of a frozen agent through flexible modification actions and end-to-end reinforcement learning. Across diverse agents on web search and deep research benchmarks, AdaCoM substantially improves performance by preserving task constraints and progress while pruning stale content. The learned strategies reveal a Fidelity-Reliability Trade-off: agents with higher vanilla ReAct performance benefit from higher-fidelity context preservation, whereas lower-performing agents require more aggressive compression to stay within a reliable reasoning regime. Transfer experiments show that AdaCoM generalizes most effectively across agents with similar capability (measured by vanilla ReAct performance), suggesting a practical path toward reusable context managers for agent systems.

Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks

context management

LLM agents

reasoning failures

closed-source agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Context Management

LLM agents

long-horizon tasks