🤖 AI Summary
To address the high computational latency and memory footprint of diffusion-based policies on resource-constrained mobile platforms, this paper proposes a lightweight deployment framework. First, we design a unified pruning-and-retraining pipeline guided by computational sensitivity analysis. Second, we introduce a consistency distillation mechanism to drastically reduce the number of sampling steps. Third, we incorporate restorative optimization to enhance denoising network accuracy. Evaluated on benchmark datasets including PushT and RoboMimic, our method achieves end-to-end real-time inference (>10 FPS), compresses model size by 62%, and maintains state-of-the-art action prediction accuracy. Crucially, it demonstrates effectiveness in real-world robotic manipulation tasks. The core contribution lies in the first synergistic integration of structured pruning, consistency distillation, and restorative optimization for diffusion policy compression—achieving an unprecedented balance between efficiency and performance.
📝 Abstract
Diffusion Policies have significantly advanced robotic manipulation tasks via imitation learning, but their application on resource-constrained mobile platforms remains challenging due to computational inefficiency and extensive memory footprint. In this paper, we propose LightDP, a novel framework specifically designed to accelerate Diffusion Policies for real-time deployment on mobile devices. LightDP addresses the computational bottleneck through two core strategies: network compression of the denoising modules and reduction of the required sampling steps. We first conduct an extensive computational analysis on existing Diffusion Policy architectures, identifying the denoising network as the primary contributor to latency. To overcome performance degradation typically associated with conventional pruning methods, we introduce a unified pruning and retraining pipeline, optimizing the model's post-pruning recoverability explicitly. Furthermore, we combine pruning techniques with consistency distillation to effectively reduce sampling steps while maintaining action prediction accuracy. Experimental evaluations on the standard datasets, ie, PushT, Robomimic, CALVIN, and LIBERO, demonstrate that LightDP achieves real-time action prediction on mobile devices with competitive performance, marking an important step toward practical deployment of diffusion-based policies in resource-limited environments. Extensive real-world experiments also show the proposed LightDP can achieve performance comparable to state-of-the-art Diffusion Policies.