🤖 AI Summary
Diffusion models suffer from low inference efficiency and severe error accumulation, hindering practical deployment. To address this, we propose a lightweight compression framework integrating progressive quantization with calibration-assisted knowledge distillation. Our method introduces a novel two-stage adaptive bit-width quantization scheme—including momentum-guided bit-width scheduling—and couples it with full-precision calibration-set-driven teacher–student distillation to preserve generative fidelity under ultra-low-bit precision. This approach overcomes the critical quality degradation observed in conventional fixed-bit quantization below 4 bits. Extensive experiments across multiple benchmarks demonstrate that our compressed models achieve Fréchet Inception Distance (FID) and CLIP Score metrics comparable to full-precision counterparts, while reducing inference latency by 50%. The proposed framework significantly outperforms existing quantization methods in both generation quality and efficiency.
📝 Abstract
Diffusion models excel in image generation but are computational and resource-intensive due to their reliance on iterative Markov chain processes, leading to error accumulation and limiting the effectiveness of naive compression techniques. In this paper, we propose PQCAD-DM, a novel hybrid compression framework combining Progressive Quantization (PQ) and Calibration-Assisted Distillation (CAD) to address these challenges. PQ employs a two-stage quantization with adaptive bit-width transitions guided by a momentum-based mechanism, reducing excessive weight perturbations in low-precision. CAD leverages full-precision calibration datasets during distillation, enabling the student to match full-precision performance even with a quantized teacher. As a result, PQCAD-DM achieves a balance between computational efficiency and generative quality, halving inference time while maintaining competitive performance. Extensive experiments validate PQCAD-DM's superior generative capabilities and efficiency across diverse datasets, outperforming fixed-bit quantization methods.