Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Transformer-based semantic communication models incur prohibitive computational overhead, hindering their deployment in 6G edge networks. Method: This paper proposes a training-free, adaptive token fusion framework that jointly minimizes inference latency and transmission resource consumption by formulating layer-wise token merging ratios as a Pareto-optimal multi-objective optimization problem. A channel-aware runtime adaptation mechanism—built upon Gaussian process-based Bayesian optimization—dynamically adjusts the merging strength according to real-time signal-to-noise ratio (SNR). The method integrates pre-trained Vision Transformers (ViTs), Token Merging (ToMe), Bayesian optimization, and Pareto frontier search. Contribution/Results: Experiments across diverse SNR conditions demonstrate significant reductions in floating-point operations (FLOPs), achieving inference acceleration while preserving semantic fidelity. The framework enables on-demand accuracy–efficiency trade-offs, validating its effectiveness for resource-constrained 6G edge semantic communication.

Technology Category

Application Category

📝 Abstract

Large-scale transformer models have emerged as a powerful tool for semantic communication systems, enabling edge devices to extract rich representations for robust inference across noisy wireless channels. However, their substantial computational demands remain a major barrier to practical deployment in resource-constrained 6G networks. In this paper, we present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmission resource usage. We formulate the selection of per-layer merging proportions as a multi-objective optimization problem to balance accuracy and computational cost. We employ Gaussian process-based Bayesian optimization to construct a Pareto frontier of optimal configurations, enabling flexible runtime adaptation to dynamic application requirements and channel conditions. Extensive experiments demonstrate that our method consistently outperforms other baselines and achieves significant reductions in floating-point operations while maintaining competitive accuracy across a wide range of signal-to-noise ratio (SNR) conditions. Additional results highlight the effectiveness of adaptive policies that adjust merging aggressiveness in response to channel quality, providing a practical mechanism to trade off latency and semantic fidelity on demand. These findings establish a scalable and efficient approach for deploying transformer-based semantic communication in future edge intelligence systems.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational demands of edge transformers

Balancing accuracy and computational cost trade-offs

Adapting token merging to dynamic channel conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free token merging framework

Multi-objective Bayesian optimization selection

Adaptive policies for channel conditions

🔎 Similar Papers

Towards Semantic Equivalence of Tokenization in Multimodal LLM

2024-06-07arXiv.orgCitations: 17

Authors to Follow