Text Embedding Knows How to Quantize Text-Guided Diffusion Models

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Diffusion models suffer from high computational overhead, hindering their deployment in resource-constrained settings; existing quantization methods neglect conditional inputs—such as text prompts—leading to suboptimal trade-offs between efficiency and generation quality. To address this, we propose the first text-guided, condition-aware dynamic quantization framework for diffusion models. Our method leverages text embeddings to steer layer-wise and timestep-specific bit-width allocation, integrating embedding-space mapping with low-bit neural network techniques to achieve fine-grained, temporally adaptive quantization control. Compatible with mainstream quantization paradigms, our approach significantly reduces computational cost—up to 62% fewer FLOPs—across multiple benchmarks, while simultaneously improving generation fidelity, as evidenced by an average FID reduction of 3.1. This demonstrates effective co-optimization of inference efficiency and image quality.

Technology Category

Application Category

📝 Abstract

Despite the success of diffusion models in image generation tasks such as text-to-image, the enormous computational complexity of diffusion models limits their use in resource-constrained environments. To address this, network quantization has emerged as a promising solution for designing efficient diffusion models. However, existing diffusion model quantization methods do not consider input conditions, such as text prompts, as an essential source of information for quantization. In this paper, we propose a novel quantization method dubbed Quantization of Language-to-Image diffusion models using text Prompts (QLIP). QLIP leverages text prompts to guide the selection of bit precision for every layer at each time step. In addition, QLIP can be seamlessly integrated into existing quantization methods to enhance quantization efficiency. Our extensive experiments demonstrate the effectiveness of QLIP in reducing computational complexity and improving the quality of the generated images across various datasets.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational complexity in text-guided diffusion models

Improving quantization efficiency using text prompts

Enhancing image quality in resource-constrained environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text prompts guide bit precision selection

Seamlessly integrates with existing quantization methods

Reduces computational complexity effectively

🔎 Similar Papers

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings