๐ค AI Summary
This work proposes EvoDS, a self-evolving autonomous agent for automated data science that overcomes the limitations of existing large language model agentsโnamely, static action spaces and inefficient long-horizon context management, which hinder cross-task accumulation of reusable experience. EvoDS introduces an Autonomous Skill Acquisition (ASA) mechanism to dynamically expand its skill set and formulates context management as a learnable control problem, employing an Adaptive Context Compression (ACC) strategy to optimize long-term memory retention. By integrating hierarchical skill composition, information bottleneck optimization, and multi-agent collaborative reinforcement learning, EvoDS achieves an average performance gain of 28.9% over current open-source agents across four diverse benchmarks and entirely eliminates execution failures caused by context length constraints.
๐ Abstract
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across tasks and operate reliably in multi-stage, iterative data science pipelines. To address these challenges, we introduce EvoDS, a self-evolving autonomous data science agent that learns to expand its skills and adaptively managing long-term context through agentic reinforcement learning. Specifically, EvoDS introduces two key strategies: (1) Autonomous Skill Acquisition (ASA) mechanism, which enables agents to synthesize, validate, and reuse executable skills; and (2) Adaptive Context Compression (ACC) strategy, which treats context management as a learned control problem rather than passive truncation. These strategies are orchestrated within a two-stage multi-agent training scheme, enabling EvoDS to autonomously improve over time. Theoretically, we prove that EvoDS's hierarchical design reduces tool-selection error, and its optimization objective aligns with an information bottleneck principle, ensuring efficient context use. Empirically, EvoDS outperforms state-of-the-art open-source data science agents by an average of 28.9% across four diverse benchmarks while eliminating out-of-token failures. Our code and data are available at https://github.com/usail-hkust/EvoDS.