Practical Policy Distillation for Reinforcement Learning in Radio Access Networks

📅 2025-11-09
📈 Citations: 0
âœĻ Influential: 0
📄 PDF
ðŸĪ– AI Summary
To address the challenges of scarce link-level data, stringent real-time requirements, network heterogeneity, and constrained 4G baseband hardware resources in AI deployment for radio access networks (RAN), this paper proposes a reinforcement learning-based policy distillation framework for lightweight link adaptation. We innovatively design both single-policy and multi-policy distillation mechanisms to consolidate knowledge from multiple scenario-specific expert models, yielding a unified student model with strong generalization capability and hardware efficiency. Evaluated in a high-fidelity 5G simulation environment, the distilled student model achieves <1 MB model size and <100 Ξs inference latency, closely matching teacher-model performance while significantly enhancing cross-scenario robustness. Our approach effectively resolves the longstanding trade-off among accuracy, efficiency, and generalizability for AI models under resource-constrained RAN deployments.

Technology Category

Application Category

📝 Abstract
Adopting artificial intelligence (AI) in radio access networks (RANs) presents several challenges, including limited availability of link-level measurements (e.g., CQI reports), stringent real-time processing constraints (e.g., sub-1 ms per TTI), and network heterogeneity (different spectrum bands, cell types, and vendor equipment). A critical yet often overlooked barrier lies in the computational and memory limitations of RAN baseband hardware, particularly in legacy 4th Generation (4G) systems, which typically lack on-chip neural accelerators. As a result, only lightweight AI models (under 1 Mb and sub-100~mu s inference time) can be effectively deployed, limiting both their performance and applicability. However, achieving strong generalization across diverse network conditions often requires large-scale models with substantial resource demands. To address this trade-off, this paper investigates policy distillation in the context of a reinforcement learning-based link adaptation task. We explore two strategies: single-policy distillation, where a scenario-agnostic teacher model is compressed into one generalized student model; and multi-policy distillation, where multiple scenario-specific teachers are consolidated into a single generalist student. Experimental evaluations in a high-fidelity, 5th Generation (5G)-compliant simulator demonstrate that both strategies produce compact student models that preserve the teachers'generalization capabilities while complying with the computational and memory limitations of existing RAN hardware.
Problem

Research questions and friction points this paper is trying to address.

Addressing computational and memory constraints in RAN hardware for AI deployment
Overcoming limited link-level measurements and real-time processing requirements
Bridging the gap between model generalization and hardware limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy distillation compresses large reinforcement learning models
Single-policy distillation creates generalized student from teacher
Multi-policy distillation consolidates specialized teachers into student
🔎 Similar Papers