🤖 AI Summary
To address the challenge of achieving robust, constraint-satisfying, and long-horizon adaptive locomotion control for quadrupedal robots on dynamic and complex terrains, this paper proposes a body-aware, infinite-horizon Model Predictive Control (MPC) framework. Methodologically, it integrates interpretable body-aware modeling with a Dreamer-style world model, introducing a co-evolving joint training mechanism between a velocity estimator and the Dreamer module to simultaneously optimize both policy and internal dynamics model. The key innovation lies in the first tight coupling of infinite-horizon MPC with end-to-end reinforcement learning—balancing safety, interpretability, and emergent locomotion capabilities. Extensive evaluations on multi-terrain simulations and a real-world quadrupedal robot platform demonstrate substantial improvements in locomotion robustness and cross-terrain generalization. Ablation studies confirm that each core component critically contributes to noise robustness and generalization performance.
📝 Abstract
A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.