Window-based Channel Attention for Wavelet-enhanced Learned Image Compression

📅 2024-09-21

🏛️ Asian Conference on Computer Vision

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the insufficient global modeling capability and poor detail preservation in learned image compression, this paper proposes a spatial-channel hybrid attention framework. First, it introduces window partitioning into channel attention, designing Windowed Channel Attention (WCA) to enhance long-range inter-channel dependency modeling. Second, it integrates a joint spatial-channel attention mechanism with Discrete Wavelet Transform (DWT)-based frequency-aware downsampling to expand the effective receptive field. Within a Transformer-based architecture, the method simultaneously captures large-receptive-field global correlations and small-receptive-field local detail fidelity. Evaluated on four standard benchmark datasets, the proposed approach achieves an average BD-rate reduction of 22.39% over VTM-23.1, establishing new state-of-the-art performance in learned image compression.

Technology Category

Application Category

📝 Abstract

Learned Image Compression (LIC) models have achieved superior rate-distortion performance than traditional codecs. Existing LIC models use CNN, Transformer, or Mixed CNN-Transformer as basic blocks. However, limited by the shifted window attention, Swin-Transformer-based LIC exhibits a restricted growth of receptive fields, affecting the ability to model large objects for image compression. To address this issue and improve the performance, we incorporate window partition into channel attention for the first time to obtain large receptive fields and capture more global information. Since channel attention hinders local information learning, it is important to extend existing attention mechanisms in Transformer codecs to the space-channel attention to establish multiple receptive fields, being able to capture global correlations with large receptive fields while maintaining detailed characterization of local correlations with small receptive fields. We also incorporate the discrete wavelet transform into our Spatial-Channel Hybrid (SCH) framework for efficient frequency-dependent down-sampling and further enlarging receptive fields. Experiment results demonstrate that our method achieves state-of-the-art performances, reducing BD-rate by 18.54%, 23.98%, 22.33%, and 24.71% on four standard datasets compared to VTM-23.1.

Problem

Research questions and friction points this paper is trying to address.

Image Compression

Visual Information Processing

Detail Retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Window-based Channel Attention

Discrete Wavelet Transform

Swin-Transformer Enhancement

🔎 Similar Papers

No similar papers found.