DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Flow-based robotic policies often suffer from representation collapse, hindering discrimination among similar visual states and undermining multimodal action modeling. To address this, we propose the first incorporation of dispersion regularization into the flow matching framework, extending MeanFlow with a novel regularized variant operating across multiple intermediate embedding spaces. Our method requires no auxiliary networks or complex training procedures, preserving one-step generation efficiency while significantly enhancing representation diversity. It supports end-to-end training and ultra-fast inference: on the RoboMimic benchmark, it achieves 20–40× speedup (0.07 s per inference) and improves task success rates by 10–20 percentage points (e.g., 99% for Lift). Furthermore, real-world validation on a Franka Panda robot demonstrates exceptional Sim2Real transfer capability.

Technology Category

Application Category

📝 Abstract

The ability to learn multi-modal action distributions is indispensable for robotic manipulation policies to perform precise and robust control. Flow-based generative models have recently emerged as a promising solution to learning distributions of actions, offering one-step action generation and thus achieving much higher sampling efficiency compared to diffusion-based methods. However, existing flow-based policies suffer from representation collapse, the inability to distinguish similar visual representations, leading to failures in precise manipulation tasks. We propose DM1 (MeanFlow with Dispersive Regularization for One-Step Robotic Manipulation), a novel flow matching framework that integrates dispersive regularization into MeanFlow to prevent collapse while maintaining one-step efficiency. DM1 employs multiple dispersive regularization variants across different intermediate embedding layers, encouraging diverse representations across training batches without introducing additional network modules or specialized training procedures. Experiments on RoboMimic benchmarks show that DM1 achieves 20-40 times faster inference (0.07s vs. 2-3.5s) and improves success rates by 10-20 percentage points, with the Lift task reaching 99% success over 85% of the baseline. Real-robot deployment on a Franka Panda further validates that DM1 transfers effectively from simulation to the physical world. To the best of our knowledge, this is the first work to leverage representation regularization to enable flow-based policies to achieve strong performance in robotic manipulation, establishing a simple yet powerful approach for efficient and robust manipulation.

Problem

Research questions and friction points this paper is trying to address.

Addressing representation collapse in flow-based robotic manipulation policies

Enabling multi-modal action distributions for precise manipulation tasks

Maintaining one-step inference efficiency while preventing representation collapse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates dispersive regularization into MeanFlow framework

Employs multiple regularization variants across embedding layers

Maintains one-step efficiency while preventing representation collapse

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey