CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing GUI-based autonomous agents struggle to simultaneously achieve long-horizon planning and high-precision execution in specialized domains such as scientific computing—general-purpose agents excel at planning but lack execution fidelity, while domain-specific agents exhibit the opposite trade-off; moreover, current compositional frameworks are typically static and non-trainable, limiting adaptability in data-scarce scenarios. Method: We propose an end-to-end trainable “dual-brain coordination” framework that decouples a general-purpose planner from a domain-specialized executor, integrating their capabilities via a two-stage learning paradigm. We innovatively train the planner using GRPO (Generalized Reinforcement Learning with Policy Optimization) and fine-tune the executor on multi-domain trajectory supervision to enable few-shot learning and cross-domain generalization. Results: Our approach significantly outperforms open-source baselines across four scientific applications in ScienceBoard, achieving, for the first time, both high execution accuracy and strong generalization capability.

Technology Category

Application Category

📝 Abstract
Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in specialized domains such as scientific computing, where both long-horizon planning and precise execution are required. Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness. Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable, which prevents adaptation from experience. This is a critical limitation given the scarcity of high-quality data in scientific domains. To address these limitations, we introduce CODA, a novel and trainable compositional framework that integrates a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained via a dedicated two-stage pipeline. In the first stage, Specialization, we apply a decoupled GRPO approach to train an expert planner for each scientific application individually, bootstrapping from a small set of task trajectories. In the second stage, Generalization, we aggregate all successful trajectories from the specialized experts to build a consolidated dataset, which is then used for supervised fine-tuning of the final planner. This equips CODA with both robust execution and cross-domain generalization. Evaluated on four challenging applications from the ScienceBoard benchmark, CODA significantly outperforms baselines and establishes a new state of the art among open-source models.
Problem

Research questions and friction points this paper is trying to address.

Bridging planning-execution gap in GUI agents
Overcoming static non-trainable compositional frameworks
Addressing data scarcity in scientific computing domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled reinforcement learning for dual-brain integration
Two-stage training pipeline with specialization and generalization
Compositional framework combining generalist planner and specialist executor
🔎 Similar Papers
No similar papers found.