🤖 AI Summary
Multi-morphological legged robots face significant challenges in generalizing locomotion policies due to discrepancies in observation/action spaces and dynamics across morphologies.
Method: This paper proposes a morphology-agnostic diffusion-residual co-learning framework: (1) a generative diffusion model trained on cross-platform fused data to learn universal locomotion priors; and (2) a lightweight, shared residual reinforcement learning policy—built upon PPO—that enables morphology-specific action refinement and task adaptation.
Contribution/Results: To our knowledge, this is the first framework enabling unified policy deployment across four heterogeneous legged platforms—including wheeled bipeds—with successful real-world transfer. Experiments demonstrate an average 10.35% improvement in simulated and physical-task returns; the wheeled biped achieves up to 13.57% gain. The approach significantly enhances robustness and cross-morphology generalization capability.
📝 Abstract
Generalizing locomotion policies across diverse legged robots with varying morphologies is a key challenge due to differences in observation/action dimensions and system dynamics. In this work, we propose Multi-Loco, a novel unified framework combining a morphology-agnostic generative diffusion model with a lightweight residual policy optimized via reinforcement learning (RL). The diffusion model captures morphology-invariant locomotion patterns from diverse cross-embodiment datasets, improving generalization and robustness. The residual policy is shared across all embodiments and refines the actions generated by the diffusion model, enhancing task-aware performance and robustness for real-world deployment. We evaluated our method with a rich library of four legged robots in both simulation and real-world experiments. Compared to a standard RL framework with PPO, our approach -- replacing the Gaussian policy with a diffusion model and residual term -- achieves a 10.35% average return improvement, with gains up to 13.57% in wheeled-biped locomotion tasks. These results highlight the benefits of cross-embodiment data and composite generative architectures in learning robust, generalized locomotion skills.