Text-Guided Token Communication for Wireless Image Transmission

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address perceptual quality degradation, semantic distortion, and the signal-to-noise ratio (SNR) “cliff effect” in image transmission under 6G low-bandwidth and harsh channel conditions, this paper proposes a text-guided discrete token communication paradigm. Methodologically, we leverage vision foundation models to map images into discrete visual tokens, integrate 5G NR polar codes for robust joint source-channel coding, and incorporate textual semantic priors to guide token prediction and reconstruction at ultra-low bitrates. Our key contribution is the first integration of vision-language alignment capability into the communication feedback loop—without requiring scene-specific retraining—thereby significantly mitigating performance collapse under SNR degradation. Experiments on ImageNet demonstrate that, at SNR > 0 dB, our method outperforms ADJSCC in perceptual fidelity (LPIPS reduced by 12.3%) and semantic consistency (CLIP Score increased by 8.7%), while exhibiting strong cross-dataset generalization.

Technology Category

Application Category

📝 Abstract

With the emergence of 6G networks and proliferation of visual applications, efficient image transmission under adverse channel conditions is critical. We present a text-guided token communication system leveraging pre-trained foundation models for wireless image transmission with low bandwidth. Our approach converts images to discrete tokens, applies 5G NR polar coding, and employs text-guided token prediction for reconstruction. Evaluations on ImageNet show our method outperforms Deep Source Channel Coding with Attention Modules (ADJSCC) in perceptual quality and semantic preservation at Signal-to-Noise Ratios (SNRs) above 0 dB while mitigating the cliff effect at lower SNRs. Our system requires no scenario-specific retraining and exhibits superior cross-dataset generalization, establishing a new paradigm for efficient image transmission aligned with human perceptual priorities.

Problem

Research questions and friction points this paper is trying to address.

Efficient image transmission under adverse 6G channel conditions

Low-bandwidth wireless image transmission using text guidance

Mitigating cliff effect and improving semantic preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pre-trained foundation models

Uses text-guided token prediction

Applies 5G NR polar coding

🔎 Similar Papers

No similar papers found.

Authors to Follow