One-Step Diffusion-Based Image Compression with Semantic Distillation

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Diffusion-based image codecs suffer from high decoding latency due to iterative sampling, hindering practical deployment. This paper proposes OneDC—the first single-step diffusion generative image codec—which eliminates iterative sampling by tightly integrating latent-space compression with single-step diffusion generation, and introduces a hyperprior as a semantic guidance signal. Our key contributions are: (1) establishing a novel paradigm for single-step diffusion image compression; (2) designing a hyperprior-based semantic distillation mechanism that transfers semantic knowledge from a pre-trained generative tokenizer; and (3) formulating a joint optimization framework across both pixel and latent domains. Experiments demonstrate that OneDC achieves state-of-the-art perceptual quality, reducing bitrate by 40% and accelerating decoding by 20× compared to the best multi-step diffusion codecs.

Technology Category

Application Category

📝 Abstract

While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasing latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec -- that integrates a latent compression module with a one-step diffusion generator. Recognizing the critical role of semantic guidance in one-step diffusion, we propose using the hyperprior as a semantic signal, overcoming the limitations of text prompts in representing complex visual content. To further enhance the semantic capability of the hyperprior, we introduce a semantic distillation mechanism that transfers knowledge from a pretrained generative tokenizer to the hyperprior codec. Additionally, we adopt a hybrid pixel- and latent-domain optimization to jointly enhance both reconstruction fidelity and perceptual realism. Extensive experiments demonstrate that OneDC achieves SOTA perceptual quality even with one-step generation, offering over 40% bitrate reduction and 20x faster decoding compared to prior multi-step diffusion-based codecs. Code will be released later.

Problem

Research questions and friction points this paper is trying to address.

Reducing latency in diffusion-based image compression

Enhancing semantic guidance without multi-step sampling

Improving reconstruction fidelity and perceptual realism

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step diffusion generator for image compression

Semantic distillation from pretrained generative tokenizer

Hybrid pixel- and latent-domain optimization

🔎 Similar Papers

No similar papers found.