Recursive Vision Transformer with Dynamic Depth and Width Adjustment for Resource-Efficient Image Semantic Communication

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the high memory footprint and computational complexity hindering the deployment of image semantic communication systems on resource-constrained devices. To overcome these challenges, the authors propose a lightweight framework based on a recurrent Vision Transformer architecture. By incorporating a recursive feature refinement mechanism, the model significantly reduces its parameter count. Furthermore, three adaptive resource allocation strategies—dynamic depth adjustment, dynamic width adjustment, and joint width-depth optimization—are introduced to enable content- and channel-aware computation. Experimental results demonstrate that, at comparable computational complexity, the proposed method reduces parameters by 48.7% while achieving superior image reconstruction quality compared to existing baselines.

📝 Abstract

Image semantic communication is a critical component in next-generation wireless communication systems. However, such systems typically suffer from large memory footprints and high computational complexity, making them difficult to deploy on resource-constrained devices. To address these challenges, we propose a vision transformer (ViT)-enabled image semantic communication system. In this system, a recursive structure is introduced to iteratively refine semantic features and reduce the parameter count. In addition, three dynamic adjustment strategies are designed to adaptively reduce computational complexity: dynamic depth adjustment, dynamic width adjustment, and joint width-depth optimization. Dynamic depth adjustment adaptively determines the number of recursive modules according to image content and channel conditions, while dynamic width adjustment selectively preserves important neurons and attention heads. The joint width-depth optimization further enables flexible computation configurations. Simulation results verify that the proposed recursive ViT-based system, combined with the three dynamic adjustment strategies, reduces the parameter count by 48.7% and achieves higher reconstruction quality than existing baselines under comparable computational complexity.

Problem

Research questions and friction points this paper is trying to address.

image semantic communication

resource-constrained devices

computational complexity

memory footprint

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive Vision Transformer

Dynamic Depth Adjustment

Dynamic Width Adjustment

Semantic Communication