🤖 AI Summary
This work addresses the high complexity, reliance on expert knowledge, and limited scalability of robot reinforcement learning (RL) in sim-to-real transfer. The authors propose the first end-to-end automated framework that formulates RL engineering as a “harness-engineering” problem. By leveraging a multi-agent collaborative architecture, the entire pipeline is decomposed into structured phases, integrating standardized instruction interfaces, a reusable knowledge base, persistent artifacts, and cross-iteration experience transfer mechanisms to jointly automate simulation environment construction, reward function generation, and hyperparameter optimization. Evaluated across six benchmarks and 16 tasks—including manipulation, locomotion, and bimanual dexterous control—the automatically generated policies match or exceed manually engineered counterparts and are successfully deployed on real robots, substantially reducing engineering overhead.
📝 Abstract
Reinforcement learning (RL) has become a powerful paradigm for robot learning, particularly in sim-to-real settings, but its broader adoption remains limited by the engineering pipeline surrounding the algorithms. Building tasks, shaping rewards, and tuning hyperparameters require substantial expert effort, making RL workflows costly and difficult to scale. We introduce HARBOR, an agentic framework that frames robot RL automation as a harness-engineering problem: given a simulator codebase and a task specification, it automates the workflow from environment setup to policy training in simulation. HARBOR decomposes such high-level objectives into bounded stages executed by specialized agents through standardized commands, persistent artifacts, executable gates, and reusable knowledge, and scales iteration via decentralized parallel trials and experience learning across runs. We evaluate HARBOR across 6 benchmarks and 16 tasks in total, spanning manipulation, locomotion, and bimanual dexterous control. We demonstrate that HARBOR automates the simulation RL workflow end-to-end, designs rewards, tunes algorithms to match or improve over default configurations, and reduces engineering effort at practical token and wall-clock cost; the resulting policies can also be transferred to real robots.