U-Shape Mamba: State Space Model for faster diffusion

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Diffusion models achieve high-quality generation but suffer from substantial computational overhead. To address this, we propose U-Shape Mamba (USM), the first efficient diffusion architecture that integrates the Mamba state space model into a U-Net hierarchical structure. USM introduces a novel dynamic sequence-length scaling mechanism: the encoder progressively compresses token sequences, while the decoder gradually restores them—enabling adaptive optimization of both computation and memory usage. Compared to Zigma—the current state-of-the-art Mamba-based diffusion model—USM achieves FID improvements of 15.3, 0.84, and 2.7 on AFHQ, CelebAHQ, and COCO, respectively, while reducing GFLOPs to one-third and significantly decreasing GPU memory consumption. Moreover, USM substantially accelerates inference speed. These advances jointly deliver high-fidelity image synthesis and strong deployment efficiency, bridging the gap between generative quality and practical applicability.

Technology Category

Application Category

📝 Abstract

Diffusion models have become the most popular approach for high-quality image generation, but their high computational cost still remains a significant challenge. To address this problem, we propose U-Shape Mamba (USM), a novel diffusion model that leverages Mamba-based layers within a U-Net-like hierarchical structure. By progressively reducing sequence length in the encoder and restoring it in the decoder through Mamba blocks, USM significantly lowers computational overhead while maintaining strong generative capabilities. Experimental results against Zigma, which is currently the most efficient Mamba-based diffusion model, demonstrate that USM achieves one-third the GFlops, requires less memory and is faster, while outperforming Zigma in image quality. Frechet Inception Distance (FID) is improved by 15.3, 0.84 and 2.7 points on AFHQ, CelebAHQ and COCO datasets, respectively. These findings highlight USM as a highly efficient and scalable solution for diffusion-based generative models, making high-quality image synthesis more accessible to the research community while reducing computational costs.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in diffusion models

Improving efficiency of image generation

Maintaining high image quality with lower resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

U-Shape Mamba reduces computational overhead significantly

Mamba blocks in U-Net structure enhance efficiency

USM outperforms Zigma in speed and image quality

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training