🤖 AI Summary
To address the challenge of dynamically co-scheduling real-time RAN tasks and generative AI (GenAI) workloads on shared infrastructure in 6G networks, this paper proposes an AI-RAN convergence platform architecture. Built upon the O-RAN framework, it extends the near-real-time RIC with a novel Y1-interface-driven RAN metric awareness mechanism and designs an end-to-end Soft Actor-Critic (SAC)-based reinforcement learning resource scheduler—enabling native GenAI workload support in RAN while guaranteeing low-latency task priority. Leveraging Multi-Instance GPU (MIG) fine-grained partitioning, the architecture enhances resource elasticity. Simulation results demonstrate a 99% RAN request satisfaction rate, significantly improved GenAI workload support efficiency, and 100% infrastructure resource utilization—substantially outperforming static allocation baselines.
📝 Abstract
The concept of AI-RAN as specified by the AI-RAN alliance is geared to explore a converged 6G platform that can support management, orchestration, and deployment of both AI and RAN workloads. This concept is central to the development of a 6G architecture that aims to exploit the accelerated compute capabilities for supporting both real-time signal processing and offloading of Generative AI (GenAI) workloads. However, both the architectural framework required to support this vision and the dynamic resource allocation strategy are still in their infancy. The O-RAN architecture intrinsically allows cloud-native disaggregated implementation. Consequently, we explore a framework that can allow orchestration of AI-and-RAN workloads by expanding the Near Real-Time RAN Intelligent Controller (NRT-RIC) within O-RAN. The framework incorporates a monitoring xApp that tracks RAN KPIs and exposes radio analytics to the proposed E2E orchestrator via a recently introduced Y1 interface. The orchestrator implements a Soft Actor-Critic (SAC) reinforcement learning algorithm to dynamically allocate critical computing resources, e.g., Multi-Instance GPUs (MIGs), between latency-sensitive RAN network functions and computationally intensive AI workloads on shared RAN infrastructure. The proposed framework provides insight on how the traditional RAN architecture can be evolved to inherently support emerging GenAI workloads. Our framework prioritizes the real-time requirements of RAN workloads while maintaining efficient resource sharing for AI applications. The simulation results demonstrate the benefits of the proposed framework, as it meets nearly 99% of the requests for RAN workload while effectively supporting AI workloads and achieving 100% utilization of the RAN infrastructure resources in a dynamic environment.