Learning Diffusion Policy from Primitive Skills for Robot Manipulation

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing diffusion-based policies often generate actions misaligned with high-level task intentions. To address this, this work proposes Skill-conditioned Diffusion Policy (SDP), which decomposes complex manipulation tasks into reusable, fine-grained primitive skills—such as “move up” or “open gripper”—and leverages a vision-language model to extract discrete representations of both environmental states and language instructions. A lightweight routing network dynamically selects the most relevant skill at each step, guiding a single-skill diffusion policy to produce aligned actions. This approach establishes a novel, skill-consistent action generation paradigm by integrating interpretable primitives with diffusion models for the first time, significantly enhancing cross-task generalization. Experiments demonstrate that SDP outperforms state-of-the-art methods across two simulation benchmarks and real-world robotic platforms, confirming its effectiveness and robustness.

Technology Category

Application Category

📝 Abstract

Diffusion policies (DP) have recently shown great promise for generating actions in robotic manipulation. However, existing approaches often rely on global instructions to produce short-term control signals, which can result in misalignment in action generation. We conjecture that the primitive skills, referred to as fine-grained, short-horizon manipulations, such as ``move up''and ``open the gripper'', provide a more intuitive and effective interface for robot learning. To bridge this gap, we propose SDP, a skill-conditioned DP that integrates interpretable skill learning with conditional action planning. SDP abstracts eight reusable primitive skills across tasks and employs a vision-language model to extract discrete representations from visual observations and language instructions. Based on them, a lightweight router network is designed to assign a desired primitive skill for each state, which helps construct a single-skill policy to generate skill-aligned actions. By decomposing complex tasks into a sequence of primitive skills and selecting a single-skill policy, SDP ensures skill-consistent behavior across diverse tasks. Extensive experiments on two challenging simulation benchmarks and real-world robot deployments demonstrate that SDP consistently outperforms SOTA methods, providing a new paradigm for skill-based robot learning with diffusion policies.

Problem

Research questions and friction points this paper is trying to address.

diffusion policy

robot manipulation

primitive skills

action generation

skill alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion policy

primitive skills

skill-conditioned control

vision-language model

robot manipulation

🔎 Similar Papers

No similar papers found.

Authors to Follow