🤖 AI Summary
Humanoid robots often suffer from poor command tracking, compounding distributional shift, and task failure in loco-manipulation due to the disconnect between high-level planning and low-level control. This work proposes REFINE-DP, a novel framework that, for the first time, employs PPO-based diffusion policy gradients to jointly optimize a high-level diffusion planner and a low-level controller within an end-to-end coordinated training scheme, effectively mitigating distribution mismatch. The approach achieves over 90% task success rate in simulation—including on out-of-distribution scenarios—and demonstrates fluent autonomous execution in real-world dynamic environments. It significantly outperforms pretrained baselines, validating substantial improvements in both motion quality and task robustness.
📝 Abstract
Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to improve task success rate, while the controller is simultaneously updated to accurately track the planner's evolving command distribution, reducing the distributional mismatch that degrades motion quality. We validate REFINE-DP on a humanoid robot performing loco-manipulation tasks, including door traversal and long-horizon object transport. REFINE-DP achieves an over $90\%$ success rate in simulation, even in out-of-distribution cases not seen in the pre-trained data, and enables smooth autonomous task execution in real-world dynamic environments. Our proposed method substantially outperforms pre-trained DP baselines and demonstrates that RL fine-tuning is key to reliable humanoid loco-manipulation. https://refine-dp.github.io/REFINE-DP/