PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Diffusion models suffer from low inference efficiency and severe error accumulation, hindering practical deployment. To address this, we propose a lightweight compression framework integrating progressive quantization with calibration-assisted knowledge distillation. Our method introduces a novel two-stage adaptive bit-width quantization scheme—including momentum-guided bit-width scheduling—and couples it with full-precision calibration-set-driven teacher–student distillation to preserve generative fidelity under ultra-low-bit precision. This approach overcomes the critical quality degradation observed in conventional fixed-bit quantization below 4 bits. Extensive experiments across multiple benchmarks demonstrate that our compressed models achieve Fréchet Inception Distance (FID) and CLIP Score metrics comparable to full-precision counterparts, while reducing inference latency by 50%. The proposed framework significantly outperforms existing quantization methods in both generation quality and efficiency.

Technology Category

Application Category

📝 Abstract

Diffusion models excel in image generation but are computational and resource-intensive due to their reliance on iterative Markov chain processes, leading to error accumulation and limiting the effectiveness of naive compression techniques. In this paper, we propose PQCAD-DM, a novel hybrid compression framework combining Progressive Quantization (PQ) and Calibration-Assisted Distillation (CAD) to address these challenges. PQ employs a two-stage quantization with adaptive bit-width transitions guided by a momentum-based mechanism, reducing excessive weight perturbations in low-precision. CAD leverages full-precision calibration datasets during distillation, enabling the student to match full-precision performance even with a quantized teacher. As a result, PQCAD-DM achieves a balance between computational efficiency and generative quality, halving inference time while maintaining competitive performance. Extensive experiments validate PQCAD-DM's superior generative capabilities and efficiency across diverse datasets, outperforming fixed-bit quantization methods.

Problem

Research questions and friction points this paper is trying to address.

Reduce computational and resource intensity in diffusion models

Address error accumulation from iterative Markov chain processes

Improve effectiveness of compression techniques for diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Quantization with adaptive bit-width transitions

Calibration-Assisted Distillation using full-precision datasets

Hybrid compression framework balancing efficiency and quality

🔎 Similar Papers

Temporal Feature Matters: A Framework for Diffusion Model Quantization