🤖 AI Summary
Existing GUI-based autonomous agents struggle to simultaneously achieve long-horizon planning and high-precision execution in specialized domains such as scientific computing—general-purpose agents excel at planning but lack execution fidelity, while domain-specific agents exhibit the opposite trade-off; moreover, current compositional frameworks are typically static and non-trainable, limiting adaptability in data-scarce scenarios. Method: We propose an end-to-end trainable “dual-brain coordination” framework that decouples a general-purpose planner from a domain-specialized executor, integrating their capabilities via a two-stage learning paradigm. We innovatively train the planner using GRPO (Generalized Reinforcement Learning with Policy Optimization) and fine-tune the executor on multi-domain trajectory supervision to enable few-shot learning and cross-domain generalization. Results: Our approach significantly outperforms open-source baselines across four scientific applications in ScienceBoard, achieving, for the first time, both high execution accuracy and strong generalization capability.
📝 Abstract
Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in specialized domains such as scientific computing, where both long-horizon planning and precise execution are required. Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness. Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable, which prevents adaptation from experience. This is a critical limitation given the scarcity of high-quality data in scientific domains. To address these limitations, we introduce CODA, a novel and trainable compositional framework that integrates a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained via a dedicated two-stage pipeline. In the first stage, Specialization, we apply a decoupled GRPO approach to train an expert planner for each scientific application individually, bootstrapping from a small set of task trajectories. In the second stage, Generalization, we aggregate all successful trajectories from the specialized experts to build a consolidated dataset, which is then used for supervised fine-tuning of the final planner. This equips CODA with both robust execution and cross-domain generalization. Evaluated on four challenging applications from the ScienceBoard benchmark, CODA significantly outperforms baselines and establishes a new state of the art among open-source models.