ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the rate-distortion performance bottleneck in learned image compression by proposing a channel-wise wavelet-domain Transformer architecture. The method integrates channel-wise wavelet transforms into windowed spatial self-attention, performing query, key, and value projections directly in the wavelet domain. It further incorporates channel-wise wavelet packet decomposition to optimize slice-based autoregressive entropy modeling, effectively sparsifying inter-channel covariance structures. Experimental results demonstrate significant improvements over state-of-the-art approaches, achieving BD-rate gains of −17.82%, −19.15%, and −22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively.

📝 Abstract

State-of-the-art learned image compression (LIC) schemes are increasingly based on hybrid CNN-transformer architectures. To further improve rate-distortion performance, we introduce channel-wise wavelet transforms into both the transformer and entropy-coding components. First, we propose a channel-wise wavelet-domain transformer attention (ChWDTA) mechanism. ChWDTA keeps the efficient windowed spatial self-attention used in modern LIC backbones, but computes the Q/K/V projections on channel-wise wavelet-transformed features before mapping the attention output back with the inverse transform. The resulting Channel-wise Wavelet-Domain Transformer Block (ChWDTB) therefore preserves the spatial tokenization pattern of windowed attention while sparsifying the channel covariance seen by the attention projections. Second, in the entropy-coding stage, we introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, which better fit channel-wise slice-based autoregressive entropy modeling. When each channel-wise subband is divided into two slices, we use eight slices for entropy coding. With this configuration, the proposed scheme obtains BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively. Even when each channel-wise subband is coded as a single slice, the scheme still retains most of the coding gains with lower complexity. The results confirm the advantage of introducing wavelet transform in CNN-transformer-based LIC schemes.

Problem

Research questions and friction points this paper is trying to address.

learned image compression

rate-distortion performance

CNN-transformer architecture

entropy modeling

wavelet transform

Innovation

Methods, ideas, or system contributions that make the work stand out.

Channel-wise Wavelet Transform

Transformer Attention

Entropy Modeling