🤖 AI Summary
This work addresses the challenges of low sample efficiency, poor generalization, and frequent retraining in dynamic multi-task resource management for Coordinated Multi-Point (CoMP) wireless networks. To overcome these limitations, the authors propose a Prompt Decision Transformer framework that formulates multi-cell selection as a sequential decision-making problem. By integrating offline reinforcement learning with task-specific prompt encoding, this approach introduces prompting mechanisms into wireless resource management for the first time, enabling few-shot adaptation to unseen network configurations—such as varying numbers of base stations, users, or scheduling policies—without requiring fine-tuning. Experimental results demonstrate that the proposed model achieves up to a 49% improvement in Quality of Experience (QoE) across multi-task scenarios, with performance gains further increasing as system scale expands, significantly outperforming conventional deep reinforcement learning methods.
📝 Abstract
Future wireless networks demand rapid adaptation to highly heterogeneous environments and dynamic task configurations, necessitating a shift from conventional rule-based and optimization-driven radio resource management (RRM) toward artificial intelligence (AI)-driven RRM. AI-driven approaches can learn complex nonlinear relationships, generalize across diverse network conditions and enable real-time, scalable and autonomous decision-making. Among RRM techniques, coordinated multipoint (CoMP) transmission is pivotal for mitigating inter-cell interference and enhancing cell-edge performance, thereby improving quality of experience (QoE) in dense deployments. However, optimal multi-cell selection remains a complex combinatorial challenge as it requires jointly optimizing over many possible serving-cell combinations under dynamic traffic and channel conditions. Despite their success, conventional deep reinforcement learning (DRL) methods such as proximal policy optimization (PPO) suffer from poor sample efficiency, limited generalization, and costly retraining when state and action spaces change. To address these bottlenecks, we propose a Prompt Decision Transformer (PromptDT) based multi-task learning framework capable of learning across diverse network configurations and reformulating multi-cell selection as a sequence modeling problem. By leveraging offline trajectories and task-specific prompts, PromptDT enables scalable learning across diverse network configurations, including varying base stations and user equipment counts, and scheduler policies. Experimental results demonstrate that PromptDT improves QoE by up to 49% in multi-task settings compared to baselines, with performance scaling positively alongside model capacity. Moreover, PromptDT generalizes effectively to unseen tasks, achieving robust few-shot adaptation to new network configurations without retraining or fine-tuning.