What Matters in Practical Learned Image Compression

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
Existing learned image codecs struggle to simultaneously achieve high perceptual quality and real-time performance on edge devices. This work systematically investigates key modeling choices affecting practicality and proposes a unified optimization framework that integrates differentiable compression, perception-driven loss optimization, and ablation-guided module design. For the first time, it conducts a large-scale neural architecture search (NAS) over millions of backbone configurations under explicit latency and quality constraints. The resulting efficient codec achieves 230 ms encoding and 150 ms decoding for 12MP images on an iPhone 17 Pro Max. In subjective evaluations, it reduces bitrate by 2.3–3× compared to AV1/VVC and further improves upon state-of-the-art learned methods by 20–40% in bitrate savings.
📝 Abstract
One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics. We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms -- faster than most top ML-based codecs run on a V100 GPU.
Problem

Research questions and friction points this paper is trying to address.

learned image compression
perceptual quality
practical codec
runtime efficiency
human visual system
Innovation

Methods, ideas, or system contributions that make the work stand out.

learned image compression
perceptual quality
neural architecture search
runtime-aware optimization
subjective evaluation
🔎 Similar Papers
No similar papers found.