Exploiting Semantic and Pixel Representations for Ultra-Low Bitrate Image Compression

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of achieving a balance among perceptual quality, semantic consistency, and pixel-level fidelity in image compression at ultra-low bitrates (<0.03 bpp), where existing methods often fall short. To this end, the authors propose SPRDiff, a novel approach that synergistically integrates semantic and pixel-level representations. SPRDiff employs a three-encoder architecture to jointly extract pretrained semantic features and distortion-oriented features, and introduces a distortion-aware reconstruction module that enables dual-level conditional guidance within a diffusion-based compression framework. Notably, the method operates with a frozen VAE encoder and significantly outperforms current state-of-the-art techniques, achieving an optimal trade-off between perceptual quality and pixel fidelity under extremely constrained bitrates.

📝 Abstract

Most existing extreme compression methods fail to achieve an optimal rate-distortion-perception trade-off, as they typically prioritize perceptual fidelity and visual realism over pixel-level accuracy. Consequently, the resulting reconstructions often deviate noticeably from the originals. Ultra-low bitrate image compression is therefore crucial-not only for producing extremely compact representations but also for ensuring that reconstructed images remain semantically coherent and faithful to the source at the pixel level. To this end, we propose SPRDiff, a diffusion-based compression method that fully leverages both semantic and pixel representations, thereby enhancing reconstruction fidelity under ultra-low bitrate constraints. Specifically, we develop a triple-encoder architecture that utilizes high-fidelity features from the pretrained distortion-oriented and semantic-oriented encoders to compensate for the limited representations extracted by the frozen VAE encoder, thereby improving latent compression and entropy modeling. To further enhance the reconstruction fidelity of diffusion models, we introduce a distortion-aware reconstruction module with dual feature extraction. This module not only generates a coarse reconstruction that preserves the main structures, but also provides practical and accurate semantic- and pixel-level conditional signals to guide the diffusion model. Extensive experiments on benchmark datasets demonstrate that our method outperforms state-of-the-art approaches in the rate-distortion-perception tradeoff at extremely low bitrates (below 0.03 bpp), effectively preserving both perceptual quality and pixel-wise fidelity in the reconstructed images. We will release the source code and trained models at https://github.com/cshw2021/SPRDiff.

Problem

Research questions and friction points this paper is trying to address.

ultra-low bitrate image compression

rate-distortion-perception trade-off

pixel-level fidelity

semantic coherence

image reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based compression

semantic representation

pixel-level fidelity