ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

📅 2024-06-03
🏛️ arXiv.org
📈 Citations: 9
Influential: 2
📄 PDF
🤖 AI Summary
Existing diffusion-based policies suffer from poor real-time performance due to multi-step denoising, failing to meet the low-latency requirements of dexterous 3D robotic manipulation. Method: This work pioneers the integration of consistency models into robot action generation, proposing an action-space conditional consistency diffusion framework and a consistency distillation scheme that enables precise single-step action synthesis on a low-dimensional action manifold. The approach leverages point-cloud-driven consistency ODE modeling, single-step forward generation, and multi-task simulation training across Adroit and Meta-World benchmarks. Contribution/Results: Evaluated on 31 manipulation tasks, the model achieves a 10× speedup in average inference latency over prior diffusion methods while attaining state-of-the-art success rates. Notably, it is the first diffusion-inspired policy successfully deployed online on real robots under strict latency constraints.

Technology Category

Application Category

📝 Abstract
Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on the point cloud input, where the original action is required to be directly denoised from any point along the ODE trajectory. To model this process, we design a consistency distillation technique to predict the action sample directly instead of predicting the noise within the vision community for fast convergence in the low-dimensional action manifold. We evaluate ManiCM on 31 robotic manipulation tasks from Adroit and Metaworld, and the results demonstrate that our approach accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive average success rate.
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D robotic manipulation with diffusion models
Reduce diffusion model inference steps for efficiency
Maintain performance while accelerating action generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step inference via consistency model
Consistency distillation for action prediction
Real-time 3D robotic manipulation
🔎 Similar Papers
No similar papers found.
Guanxing Lu
Guanxing Lu
Tsinghua University
VLARLRobotics3D Vision
Z
Zifeng Gao
Tsinghua Shenzhen International Graduate School, Tsinghua University
T
Tianxing Chen
Shanghai AI Laboratory
W
Wen-Dao Dai
Tsinghua Shenzhen International Graduate School, Tsinghua University
Z
Ziwei Wang
Carnegie Mellon University
Y
Yansong Tang
Tsinghua Shenzhen International Graduate School, Tsinghua University