ðĪ AI Summary
To address the challenges of scarce link-level data, stringent real-time requirements, network heterogeneity, and constrained 4G baseband hardware resources in AI deployment for radio access networks (RAN), this paper proposes a reinforcement learning-based policy distillation framework for lightweight link adaptation. We innovatively design both single-policy and multi-policy distillation mechanisms to consolidate knowledge from multiple scenario-specific expert models, yielding a unified student model with strong generalization capability and hardware efficiency. Evaluated in a high-fidelity 5G simulation environment, the distilled student model achieves <1 MB model size and <100 Ξs inference latency, closely matching teacher-model performance while significantly enhancing cross-scenario robustness. Our approach effectively resolves the longstanding trade-off among accuracy, efficiency, and generalizability for AI models under resource-constrained RAN deployments.
ð Abstract
Adopting artificial intelligence (AI) in radio access networks (RANs) presents several challenges, including limited availability of link-level measurements (e.g., CQI reports), stringent real-time processing constraints (e.g., sub-1 ms per TTI), and network heterogeneity (different spectrum bands, cell types, and vendor equipment). A critical yet often overlooked barrier lies in the computational and memory limitations of RAN baseband hardware, particularly in legacy 4th Generation (4G) systems, which typically lack on-chip neural accelerators. As a result, only lightweight AI models (under 1 Mb and sub-100~mu s inference time) can be effectively deployed, limiting both their performance and applicability. However, achieving strong generalization across diverse network conditions often requires large-scale models with substantial resource demands. To address this trade-off, this paper investigates policy distillation in the context of a reinforcement learning-based link adaptation task. We explore two strategies: single-policy distillation, where a scenario-agnostic teacher model is compressed into one generalized student model; and multi-policy distillation, where multiple scenario-specific teachers are consolidated into a single generalist student. Experimental evaluations in a high-fidelity, 5th Generation (5G)-compliant simulator demonstrate that both strategies produce compact student models that preserve the teachers'generalization capabilities while complying with the computational and memory limitations of existing RAN hardware.