π€ AI Summary
This work addresses the degradation of communication quality in dynamic wireless environments caused by line-of-sight (LoS) blockages affecting mobile gNodeBs. To tackle this challenge, the authors propose VisionRAN, a novel architecture that integrates visual perception into the O-RAN framework for the first time. By fusing visual and radio frequency data, they design a multimodal-aware xApp, termed VisionApp, which leverages a deep Q-network (DQN) to dynamically optimize gNodeB placement and maintain stable links. A digital twin platform, VisionTwin, is developed to support visual data integration and enable training and validation. Experimental results demonstrate that the proposed approach reduces LoS blockage duration by up to 75% compared to static deployments, thereby validating the efficacy and innovation of multimodal perception and intelligent control in next-generation radio access networks.
π Abstract
This paper proposes a vision-based framework for the intelligent control of mobile Open Radio Access Network (O-RAN) base stations (gNBs) operating in dynamic wireless environments. The framework comprises three innovative components. The first is the introduction of novel Service Models (SMs) within a vision-enabled O-RAN architecture, termed VisionRAN. These SMs extend state-of-the-art O-RAN-based architectures by enabling the transmission of vision-based sensing data and gNB positioning control messages. The second is an O-RAN xApp, VisionApp, which fuses vision and radio data, and uses this information to control the position of a mobile gNB, using a Deep Q-Network (DQN). The third is a digital twin environment, VisionTwin, which incorporates vision data and can emulate realistic wireless scenarios; this digital twin was used to train the DQN running in VisionApp and validate the overall system. Experimental results, obtained using real vision data and an emulated radio, demonstrate that the proposed approach reduces the duration of Line-of-Sight (LoS) blockages by up to 75% compared to a static gNB. These findings confirm the viability of integrating multimodal perception and learning-based control within RANs.