🤖 AI Summary
To address the poor generalizability of conventional rule-based radio resource management (RRM) algorithms in dynamic heterogeneous wireless environments and the overfitting susceptibility of existing reinforcement learning (RL) approaches, this paper proposes a generalizable RL framework for 6G radio access networks (RANs). Methodologically, it employs a graph attention network (GAT) to model inter-cell topological relationships, integrates domain randomization to enhance environmental robustness, and leverages an O-RAN-aligned, cloud-native centralized training architecture enabling distributed asynchronous data collection and single-agent cross-scenario generalization. Key innovations include the synergistic integration of attention mechanisms, domain randomization, and cloud-edge collaborative training. Evaluated across five 5G benchmark scenarios, the framework significantly improves downlink adaptive modulation and coding: average throughput increases by 10–20%; in a nine-cell scenario, it outperforms an MLP baseline by 30%; and gains for eMBB and mixed traffic reach 4× and 2×, respectively.
📝 Abstract
Modern RAN operate in highly dynamic and heterogeneous environments, where hand-tuned, rule-based RRM algorithms often underperform. While RL can surpass such heuristics in constrained settings, the diversity of deployments and unpredictable radio conditions introduce major generalization challenges. Data-driven policies frequently overfit to training conditions, degrading performance in unseen scenarios. To address this, we propose a generalization-centered RL framework for RAN control that: (i) encodes cell topology and node attributes via attention-based graph representations; (ii) applies domain randomization to broaden the training distribution; and (iii) distributes data generation across multiple actors while centralizing training in a cloud-compatible architecture aligned with O-RAN principles. Although generalization increases computational and data-management complexity, our distributed design mitigates this by scaling data collection and training across diverse network conditions. Applied to downlink link adaptation in five 5G benchmarks, our policy improves average throughput and spectral efficiency by ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO and by >20% under high mobility. It matches specialized RL in full-buffer traffic and achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks, respectively. In nine-cell deployments, GAT models offer 30% higher throughput over MLP baselines. These results, combined with our scalable architecture, offer a path toward AI-native 6G RAN using a single, generalizable RL agent.