Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Non-prehensile object manipulation in cluttered environments faces challenges including complex contact dynamics and difficulty in long-horizon planning. This paper proposes a hierarchical control architecture: a high-level reinforcement learning (PPO) agent generates semantic intermediate goals, while a low-level goal-conditioned diffusion model synthesizes physically feasible and efficient trajectories in real time. To our knowledge, this is the first work to synergistically integrate reinforcement learning with conditional diffusion models for non-prehensile manipulation, enabling goal-directed, decoupled control. Evaluated in a 2D physics simulator, our method achieves a success rate of 92.3%, reduces path length by 37% compared to state-of-the-art methods, and demonstrates strong generalization and scalability across diverse obstacle configurations. These results significantly enhance the practicality and robustness of non-prehensile manipulation systems.

Technology Category

Application Category

📝 Abstract

Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD.

Problem

Research questions and friction points this paper is trying to address.

Hierarchical RL-diffusion policy for nonprehensile object pushing

Decomposes tasks into goal selection and trajectory generation

Improves success, efficiency, and generalization in cluttered environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical RL-diffusion policy for manipulation

High-level RL selects intermediate spatial goals

Low-level diffusion model generates efficient trajectories

🔎 Similar Papers

Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention

2024-03-26arXiv.orgCitations: 0

Boston Dynamics

The base pay range for this position is between $155,000 to $220,000 annually. Base pay will depend on multiple individualized factors including, but not limited to internal equity, job related knowledge, skills and experience. This range represents a good faith estimate of compensation at the time of posting. Boston Dynamics offers a generous Benefits package including medical, dental vision, 401(k), paid time off and a annual bonus structure. Additional details regarding these benefit plans will be provided if an employee receives an offer for employment.

Waltham, MA

Research Scientist Intern, Robotic Control Policy (PhD)