🤖 AI Summary
This work addresses the challenge of efficiently transmitting high-dimensional AI models over wireless channels, where limited communication resources are exacerbated by conventional retransmission mechanisms that ignore the heterogeneous impact of individual parameters on model performance. To overcome this limitation, the paper proposes PASAR (Parameter Sensitivity-Aware Adaptive Retransmission), a novel framework that, for the first time, integrates parameter sensitivity into wireless retransmission decisions. By jointly considering real-time channel error rates and the importance of model parameters, PASAR dynamically allocates resources and adaptively terminates transmission when sufficient accuracy is achieved. The method introduces an online retransmission protocol guided by the sensitivity distribution of parameters, enabling intelligent, packet-level retransmission strategies. Experimental results across multiple deep neural networks and real-world datasets demonstrate that PASAR significantly outperforms traditional HARQ schemes, achieving comparable or better model accuracy while substantially reducing communication overhead and transmission latency.
📝 Abstract
The edge artificial intelligence (AI) applications in next-generation mobile networks demand efficient AI-model downloading techniques to support real-time, on-device inference. However, transmitting high-dimensional AI models over wireless channels remains challenging due to limited communication resources. To address this issue, we propose a parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity. Empirical analysis reveals a highly right-skewed sensitivity distribution, indicating that only a small fraction of parameters significantly affect model performance. Leveraging this insight, we design a novel online retransmission protocol, i.e., the PASAR protocol, that adaptively terminates packet transmission based on real-time bit error rate (BER) measurements and the associated parametric sensitivity. The protocol employs an adaptive, round-wise stopping criterion, enabling heterogeneous, packet-level retransmissions that preserve overall model functionality but reduce overall latency. Extensive experiments across diverse deep neural network architectures and real-world datasets demonstrate that PASAR substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.