One-Step Diffusion-Based Image Compression with Semantic Distillation

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion-based image codecs suffer from high decoding latency due to iterative sampling, hindering practical deployment. This paper proposes OneDC—the first single-step diffusion generative image codec—which eliminates iterative sampling by tightly integrating latent-space compression with single-step diffusion generation, and introduces a hyperprior as a semantic guidance signal. Our key contributions are: (1) establishing a novel paradigm for single-step diffusion image compression; (2) designing a hyperprior-based semantic distillation mechanism that transfers semantic knowledge from a pre-trained generative tokenizer; and (3) formulating a joint optimization framework across both pixel and latent domains. Experiments demonstrate that OneDC achieves state-of-the-art perceptual quality, reducing bitrate by 40% and accelerating decoding by 20× compared to the best multi-step diffusion codecs.

Technology Category

Application Category

📝 Abstract
While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasing latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec -- that integrates a latent compression module with a one-step diffusion generator. Recognizing the critical role of semantic guidance in one-step diffusion, we propose using the hyperprior as a semantic signal, overcoming the limitations of text prompts in representing complex visual content. To further enhance the semantic capability of the hyperprior, we introduce a semantic distillation mechanism that transfers knowledge from a pretrained generative tokenizer to the hyperprior codec. Additionally, we adopt a hybrid pixel- and latent-domain optimization to jointly enhance both reconstruction fidelity and perceptual realism. Extensive experiments demonstrate that OneDC achieves SOTA perceptual quality even with one-step generation, offering over 40% bitrate reduction and 20x faster decoding compared to prior multi-step diffusion-based codecs. Code will be released later.
Problem

Research questions and friction points this paper is trying to address.

Reducing latency in diffusion-based image compression
Enhancing semantic guidance without multi-step sampling
Improving reconstruction fidelity and perceptual realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step diffusion generator for image compression
Semantic distillation from pretrained generative tokenizer
Hybrid pixel- and latent-domain optimization
🔎 Similar Papers
No similar papers found.
Naifu Xue
Naifu Xue
Communication University of China
Zhaoyang Jia
Zhaoyang Jia
University of Science and Technology of China
Video compressiondigital watermarking
J
Jiahao Li
Microsoft Research Asia
B
Bin Li
Microsoft Research Asia
Y
Yuan Zhang
Communication University of China
Y
Yan Lu
Microsoft Research Asia