Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
This work addresses the limitations of existing diffusion-based policies in robotic manipulation, which are sensitive to distributional shifts and often rely on expensive expert demonstrations. Conventional behavioral cloning typically overlooks negative samples—i.e., erroneous actions corrected by humans—leading to overfitting. To overcome these issues, the paper introduces a novel contrastive learning framework that explicitly leverages paired positive and negative action segments extracted from human corrective interventions. By formulating supervision over sets of desirable actions, the method integrates chunked action diffusion models with human-in-the-loop data aggregation. Evaluated across diverse manipulation tasks, the approach significantly enhances policy generalization and robustness, particularly under noisy conditions, while also generating high-quality aggregated datasets that improve data efficiency.
📝 Abstract
Diffusion policies have recently emerged as a powerful framework for robotic manipulation. However, like other behavior cloning methods, they remain vulnerable to distributional shift, often requiring human-in-the-loop interventions to correct failures during deployment. These interactions naturally provide paired supervision in the form of the robot's undesired actions and the human teacher's corrective actions. Yet existing data aggregation pipelines and standard behavior cloning losses largely ignore this negative signal from undesired actions, leading to overfitting to teacher's actions and an increasing reliance on costly expert data. To address this limitation, we propose Set-Supervised Diffusion Policy (SDP), a novel learning framework that utilizes contrastive action-chunk data to train diffusion policies from human corrections. From paired positive and negative action-chunks, SDP constructs a set of desired action-chunks and designs a training pipeline that encourages the diffusion policy to align with the set. Through extensive experiments across multiple robotic manipulation tasks, we demonstrate that SDP consistently improves policy performance, with particularly strong gains in robustness to noisy data. Moreover, SDP induces high-quality aggregated datasets, enabling more efficient and reliable policy learning from human-in-the-loop corrections. Our code is available at https://set-supervised-diffusion-policy.github.io/.
Problem

Research questions and friction points this paper is trying to address.

diffusion policy
distributional shift
human-in-the-loop
behavior cloning
action-chunking
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion policy
action-chunking
set-supervised learning
human-in-the-loop correction
contrastive learning
🔎 Similar Papers
No similar papers found.