🤖 AI Summary
Existing neural image compression methods struggle to achieve a practical, operable trade-off among rate, distortion, and perceptual quality (RDP), primarily due to the overhead of introducing shared randomness at the decoder. This work proposes a diffusion-based single-bitstream framework that, at a fixed bitrate, guides the denoising process through dual constraints of distortion and idempotency, while employing consistent noise injection to realize shared randomness at zero additional bitrate cost. For the first time, this approach enables a single model to traverse the full RDP frontier. The method is compatible with CNNs, Transformers, and hybrid architectures, and experiments on CelebA-HQ, CLIC2020, and ImageNet-1K demonstrate that DCIC_RDP outperforms all perception-oriented codecs in BD-PSNR, while DCIC_RP matches specialized perceptual methods in BD-FID.
📝 Abstract
The rate-distortion-perception (RDP) trade-off extends classical rate--distortion theory by imposing a distributional constraint on reconstructions, providing a unified framework for neural image compression that jointly governs fidelity and perceptual realism. While prior work achieves near-optimal rate--perception trade-offs, practical frameworks explicitly realizing the full RDP surface remain scarce, primarily due to the difficulty of introducing common randomness at the decoder. We propose DCIC (Dual-Constrained Diffusion Image Compression), which integrates a learned codec with a diffusion-based decoder governed by joint distortion and idempotence constraints. The distortion constraint bounds reconstruction fidelity relative to the base codec output; the idempotence constraint -- requiring that re-encoding the restored image recovers the base codec reconstruction -- serves as a tractable surrogate for the distributional perception requirement. Together, they steer the reverse denoising process via iterative optimization with consistent noise injection, realizing common randomness without additional rate overhead. At fixed rate, dual attenuation factors $(K_D, K_P)$ jointly navigate the Pareto frontier of the distortion-perception plane, enabling continuously adjustable fidelity-realism trade-offs from a single bitstream. DCIC$_{RD}$ ($K_P{=}0$) and DCIC$_{RP}$ ($K_D{=}0$) arise as boundary curves, with DCIC$_{RDP}$ ($K_D = K_P=1$) realizing the optimal interior operating point. Experiments on CelebA-HQ, CLIC2020, and ImageNet-1K across CNN, Transformer, and hybrid architectures confirm that DCIC$_{RDP}$ achieves superior BD-PSNR over all perceptual codecs, while DCIC$_{RP}$ matches dedicated perception-oriented methods in BD-FID, validating the practical value of full RDP surface navigation.