Learning to Walk with Less: a Dyna-Style Approach to Quadrupedal Locomotion

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL)-based policy training for quadrupedal robot locomotion suffers from low sample efficiency and heavy reliance on costly real-world interactions. Method: This paper proposes a Dyna-style model-augmented framework that integrates short-horizon synthetic transitions—generated by an online-learned environment dynamics model—into the PPO policy update process. A rollout-length adaptive scheduling mechanism is introduced to explicitly exploit the strong correlation between sample efficiency and rollout length. Contribution/Results: Evaluated on the Unitree Go1 simulation platform, the framework significantly reduces real-world interaction steps while improving both the mean return and stability of learned policies (reduced variance). Furthermore, it generalizes effectively across diverse motion-tracking tasks, demonstrating robustness and transferability.

Technology Category

Application Category

📝 Abstract
Traditional RL-based locomotion controllers often suffer from low data efficiency, requiring extensive interaction to achieve robust performance. We present a model-based reinforcement learning (MBRL) framework that improves sample efficiency for quadrupedal locomotion by appending synthetic data to the end of standard rollouts in PPO-based controllers, following the Dyna-Style paradigm. A predictive model, trained alongside the policy, generates short-horizon synthetic transitions that are gradually integrated using a scheduling strategy based on the policy update iterations. Through an ablation study, we identified a strong correlation between sample efficiency and rollout length, which guided the design of our experiments. We validated our approach in simulation on the Unitree Go1 robot and showed that replacing part of the simulated steps with synthetic ones not only mimics extended rollouts but also improves policy return and reduces variance. Finally, we demonstrate that this improvement transfers to the ability to track a wide range of locomotion commands using fewer simulated steps.
Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in quadrupedal locomotion controllers
Reducing required interaction for robust RL-based performance
Enhancing command tracking ability with fewer simulated steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dyna-Style MBRL with synthetic data generation
Predictive model creates short-horizon synthetic transitions
Rollout length scheduling strategy improves efficiency
🔎 Similar Papers
No similar papers found.
F
Francisco Affonso
Mobile Robotics Group, São Carlos School of Engineering, University of São Paulo (EESC-USP), BR
F
Felipe Andrade G. Tommaselli
Mobile Robotics Group, São Carlos School of Engineering, University of São Paulo (EESC-USP), BR
J
Juliano Negri
Mobile Robotics Group, São Carlos School of Engineering, University of São Paulo (EESC-USP), BR
V
Vivian S. Medeiros
Mobile Robotics Group, São Carlos School of Engineering, University of São Paulo (EESC-USP), BR
M
Mateus V. Gasparino
Field Robotics Engineering and Science Hub (FRESH), University of Illinois at Urbana-Champaign (UIUC), USA
Girish Chowdhary
Girish Chowdhary
Associate Professor
RoboticsAgricultural RoboticsAdaptive ControlMobile Robotics
M
Marcelo Becker
Mobile Robotics Group, São Carlos School of Engineering, University of São Paulo (EESC-USP), BR