🤖 AI Summary
Existing humanoid robot gait controllers struggle to simultaneously achieve naturalness and disturbance rejection: reward functions based solely on task performance often yield stiff gaits, while motion imitation improves visual realism at the cost of robustness. This work proposes a predictive style-matching approach that leverages an offline-trained, state-conditioned predictor to map lower-body history and velocity commands into upper-body joint targets and gait style descriptors. These predictions inform a style-guided reward component that harmonizes natural appearance with task-oriented performance during reinforcement learning. Notably, the predictor is used only during training; deployment relies exclusively on a proprioceptive policy without additional inference overhead. Evaluated on the Unitree G1 robot, the method reduces upper-body style error by nearly an order of magnitude compared to task-only RL while preserving disturbance rejection capability, and achieves approximately five times lower failure rate in perturbation recovery than motion imitation baselines.
📝 Abstract
Reinforcement learning has become the prevailing approach to humanoid locomotion control: policies transfer reliably from simulation to hardware and recover gracefully from disturbances.
Motion quality, however, still lags behind: task-only rewards often converge to stiff, asymmetric gaits, while motion imitation methods improve appearance but become more sensitive to external disturbances because reference signals can oppose the transient poses needed to regain balance.
We propose Predictive Style Matching, in which an offline predictor maps the robot's lower-body state history and velocity commands to interpretable upper-body joint and gait targets that shape the rewards during training.
Because the targets are state-conditioned rather than time-indexed and the predictor is used only at training time, the deployed controller inherits the proprioceptive interface and inference cost of a task-only RL baseline.
On the Unitree G1, in both simulation and hardware, PSM reduces upper-body style error by roughly an order of magnitude over task-only RL while preserving its fall-recovery rate, whereas the motion-imitation baseline attains the lowest style error but fails to recover from disturbances about five times as often.