FEDS: Feature and Entropy-Based Distillation Strategy for Efficient Learned Image Compression

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of large parameter count, slow inference, and poor deployability in learned image compression (LIC) models, this paper proposes a lightweighting method based on a Swin-V2 teacher model and Feature-Entropy Dual-driven Distillation (FEDS). Our approach innovatively introduces an entropy-weighted channel-level distillation mechanism that jointly aligns attention-aware features and models latent-space channel importance guided by information entropy. Furthermore, we design a three-stage progressive knowledge transfer framework to enhance distillation efficacy. The resulting student model achieves only marginal BD-Rate degradations of +1.24%, +1.17%, and +0.55% on the Kodak, TECNICK, and CLIC benchmarks, respectively, while reducing parameters by 63% and accelerating encoding/decoding by 73%. Notably, the method demonstrates strong generalization across diverse Transformer-based LIC architectures.

Technology Category

Application Category

📝 Abstract
Learned image compression (LIC) methods have recently outperformed traditional codecs such as VVC in rate-distortion performance. However, their large models and high computational costs have limited their practical adoption. In this paper, we first construct a high-capacity teacher model by integrating Swin-Transformer V2-based attention modules, additional residual blocks, and expanded latent channels, thus achieving enhanced compression performance. Building on this foundation, we propose a underline{F}eature and underline{E}ntropy-based underline{D}istillation underline{S}trategy ( extbf{FEDS}) that transfers key knowledge from the teacher to a lightweight student model. Specifically, we align intermediate feature representations and emphasize the most informative latent channels through an entropy-based loss. A staged training scheme refines this transfer in three phases: feature alignment, channel-level distillation, and final fine-tuning. Our student model nearly matches the teacher across Kodak (1.24% BD-Rate increase), Tecnick (1.17%), and CLIC (0.55%) while cutting parameters by about 63% and accelerating encoding/decoding by around 73%. Moreover, ablation studies indicate that FEDS generalizes effectively to transformer-based networks. The experimental results demonstrate our approach strikes a compelling balance among compression performance, speed, and model parameters, making it well-suited for real-time or resource-limited scenarios.
Problem

Research questions and friction points this paper is trying to address.

Reduces model size and computational cost in learned image compression.
Transfers knowledge from a high-capacity teacher to a lightweight student model.
Improves compression performance while maintaining speed and reducing parameters.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Swin-Transformer V2-based attention modules
Feature and Entropy-Based Distillation Strategy
Staged training scheme for knowledge transfer
🔎 Similar Papers
No similar papers found.