REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Humanoid robots often suffer from poor command tracking, compounding distributional shift, and task failure in loco-manipulation due to the disconnect between high-level planning and low-level control. This work proposes REFINE-DP, a novel framework that, for the first time, employs PPO-based diffusion policy gradients to jointly optimize a high-level diffusion planner and a low-level controller within an end-to-end coordinated training scheme, effectively mitigating distribution mismatch. The approach achieves over 90% task success rate in simulation—including on out-of-distribution scenarios—and demonstrates fluent autonomous execution in real-world dynamic environments. It significantly outperforms pretrained baselines, validating substantial improvements in both motion quality and task robustness.

Technology Category

Application Category

📝 Abstract

Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to improve task success rate, while the controller is simultaneously updated to accurately track the planner's evolving command distribution, reducing the distributional mismatch that degrades motion quality. We validate REFINE-DP on a humanoid robot performing loco-manipulation tasks, including door traversal and long-horizon object transport. REFINE-DP achieves an over $90\%$ success rate in simulation, even in out-of-distribution cases not seen in the pre-trained data, and enables smooth autonomous task execution in real-world dynamic environments. Our proposed method substantially outperforms pre-trained DP baselines and demonstrates that RL fine-tuning is key to reliable humanoid loco-manipulation. https://refine-dp.github.io/REFINE-DP/

Problem

Research questions and friction points this paper is trying to address.

humanoid loco-manipulation

diffusion policy

distribution shift

hierarchical control

command tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Policy

Reinforcement Learning Fine-tuning

Humanoid Loco-manipulation