🤖 AI Summary
Transformer-based semantic communication models incur prohibitive computational overhead, hindering their deployment in 6G edge networks.
Method: This paper proposes a training-free, adaptive token fusion framework that jointly minimizes inference latency and transmission resource consumption by formulating layer-wise token merging ratios as a Pareto-optimal multi-objective optimization problem. A channel-aware runtime adaptation mechanism—built upon Gaussian process-based Bayesian optimization—dynamically adjusts the merging strength according to real-time signal-to-noise ratio (SNR). The method integrates pre-trained Vision Transformers (ViTs), Token Merging (ToMe), Bayesian optimization, and Pareto frontier search.
Contribution/Results: Experiments across diverse SNR conditions demonstrate significant reductions in floating-point operations (FLOPs), achieving inference acceleration while preserving semantic fidelity. The framework enables on-demand accuracy–efficiency trade-offs, validating its effectiveness for resource-constrained 6G edge semantic communication.
📝 Abstract
Large-scale transformer models have emerged as a powerful tool for semantic communication systems, enabling edge devices to extract rich representations for robust inference across noisy wireless channels. However, their substantial computational demands remain a major barrier to practical deployment in resource-constrained 6G networks. In this paper, we present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmission resource usage. We formulate the selection of per-layer merging proportions as a multi-objective optimization problem to balance accuracy and computational cost. We employ Gaussian process-based Bayesian optimization to construct a Pareto frontier of optimal configurations, enabling flexible runtime adaptation to dynamic application requirements and channel conditions. Extensive experiments demonstrate that our method consistently outperforms other baselines and achieves significant reductions in floating-point operations while maintaining competitive accuracy across a wide range of signal-to-noise ratio (SNR) conditions. Additional results highlight the effectiveness of adaptive policies that adjust merging aggressiveness in response to channel quality, providing a practical mechanism to trade off latency and semantic fidelity on demand. These findings establish a scalable and efficient approach for deploying transformer-based semantic communication in future edge intelligence systems.