🤖 AI Summary
Soft-bodied robots suffer from significant kinematic and dynamic uncertainties, hindering high-precision whole-body control and effective sim-to-real transfer. This paper introduces a motion-primitive-guided reinforcement learning framework, enabling zero-shot policy training in MuJoCo at high-fidelity simulation speed (350× real-time on a single thread), and direct deployment on the large-scale Baloo soft robotic platform—featuring 10 kg payload capacity, dual continuum soft arms, and full 6-DOF whole-body manipulation. To our knowledge, this is the first demonstration of zero-shot, real-world transfer for rich-contact whole-body tasks without any real-world fine-tuning, achieving an 88% task success rate; the learned policies further exhibit robust regrasping and disturbance recovery capabilities. Key contributions are: (1) embedding motion primitives directly into the RL policy space to circumvent conventional reward shaping challenges; and (2) establishing an efficient simulation–hardware co-design paradigm that empirically validates the feasibility of zero-shot sim-to-real transfer for large-scale soft robotic systems.
📝 Abstract
Whole-body manipulation is a powerful yet underexplored approach that enables robots to interact with large, heavy, or awkward objects using more than just their end-effectors. Soft robots, with their inherent passive compliance, are particularly well-suited for such contact-rich manipulation tasks, but their uncertainties in kinematics and dynamics pose significant challenges for simulation and control. In this work, we address this challenge with a simulation that can run up to 350x real time on a single thread in MuJoCo and provide a detailed analysis of the critical tradeoffs between speed and accuracy for this simulation. Using this framework, we demonstrate a successful zero-shot sim-to-real transfer of a learned whole-body manipulation policy, achieving an 88% success rate on the Baloo hardware platform. We show that guiding RL with a simple motion primitive is critical to this success where standard reward shaping methods struggled to produce a stable and successful policy for whole-body manipulation. Furthermore, our analysis reveals that the learned policy does not simply mimic the motion primitive. It exhibits beneficial reactive behavior, such as re-grasping and perturbation recovery. We analyze and contrast this learned policy against an open-loop baseline to show that the policy can also exhibit aggressive over-corrections under perturbation. To our knowledge, this is the first demonstration of forceful, six-DoF whole-body manipulation using two continuum soft arms on a large-scale platform (10 kg payloads), with zero-shot policy transfer.