🤖 AI Summary
This work proposes CodeDelegator, a multi-agent framework that decouples strategic planning from code implementation through role separation to address the limitations of single-agent systems. In conventional setups, a solitary agent simultaneously handles both high-level planning and low-level coding, making it susceptible to interference from debugging artifacts and intermediate failures, which degrades long-horizon task performance. CodeDelegator introduces a persistent Delegator agent responsible for task decomposition and monitoring, while individual Coder agents execute subtasks in isolated, clean contexts. The framework incorporates role specialization and an Ephemeral-Persistent State Separation (EPSS) mechanism to effectively isolate transient execution states from the global planning context, thereby preventing information contamination. Experimental results demonstrate that this approach significantly improves success rates on long-horizon tasks across multiple benchmarks, confirming its effectiveness and robustness.
📝 Abstract
Recent advances in large language models (LLMs) allow agents to represent actions as executable code, offering greater expressivity than traditional tool-calling. However, real-world tasks often demand both strategic planning and detailed implementation. Using a single agent for both leads to context pollution from debugging traces and intermediate failures, impairing long-horizon performance. We propose CodeDelegator, a multi-agent framework that separates planning from implementation via role specialization. A persistent Delegator maintains strategic oversight by decomposing tasks, writing specifications, and monitoring progress without executing code. For each sub-task, a new Coder agent is instantiated with a clean context containing only its specification, shielding it from prior failures. To coordinate between agents, we introduce Ephemeral-Persistent State Separation (EPSS), which isolates each Coder's execution state while preserving global coherence, preventing debugging traces from polluting the Delegator's context. Experiments on various benchmarks demonstrate the effectiveness of CodeDelegator across diverse scenarios.