🤖 AI Summary
The longstanding disconnect between telecommunications and computer vision hinders the realization of integrated sensing and communication (ISAC) in 6G systems. Method: This paper proposes a multi-agent vision–radio cooperative architecture tailored for Open RAN (O-RAN), which tightly fuses real-time video streams with RF channel sensing data. It introduces lightweight video functional modules and a low-overhead cross-modal feature alignment mechanism, enabling unified obstacle detection, localization, and channel impact prediction at the xApp layer. Contribution/Results: The architecture achieves sub-millisecond (<1 ms) generation and distribution of sensing information—enabling real-time xApp-driven beamforming and link switching for the first time. Experiments demonstrate significant improvements in line-of-sight link robustness and dynamic control accuracy, establishing a deployable edge intelligence paradigm for 5G-Advanced and 6G ISAC systems.
📝 Abstract
Telecommunications and computer vision have evolved independently. With the emergence of high-frequency wireless links operating mostly in line-of-sight, visual data can help predict the channel dynamics by detecting obstacles and help overcoming them through beamforming or handover techniques.
This paper proposes a novel architecture for delivering real-time radio and video sensing information to O-RAN xApps through a multi-agent approach, and introduces a new video function capable of generating blockage information for xApps, enabling Integrated Sensing and Communications. Experimental results show that the delay of sensing information remains under 1,ms and that an xApp can successfully use radio and video sensing information to control the 5G/6G RAN in real-time.