🤖 AI Summary
Existing reinforcement learning approaches for dynamic wireless networks suffer from low data efficiency, insufficient integration of physical–logical modeling, and limited long-horizon planning capability. To address these challenges, this paper proposes a dual-brain world model framework: System 1 learns statistical patterns from network observations, while System 2 explicitly encodes the physics- and logic-driven dynamics of millimeter-wave V2X channels. Inspired by cognitive psychology’s dual-process theory—introduced here for the first time to wireless resource scheduling—the framework integrates Sionna-based channel simulation, ray tracing, and differentiable imagined rollouts to enable end-to-end logically consistent state inference and observation-free decision-making. Experiments demonstrate that our method significantly outperforms multi-agent RL (MFRL), model-based RL (MBRL), and single-system world models in data efficiency, generalization to unseen scenarios, and conditional age-of-information (CAoI) optimization, thereby substantially improving long-term scheduling performance.
📝 Abstract
Despite the popularity of reinforcement learning (RL) in wireless networks, existing approaches that rely on model-free RL (MFRL) and model-based RL (MBRL) are data inefficient and short-sighted. Such RL-based solutions cannot generalize to novel network states since they capture only statistical patterns rather than the underlying physics and logic from wireless data. These limitations become particularly challenging in complex wireless networks with high dynamics and long-term planning requirements. To address these limitations, in this paper, a novel dual-mind world model-based learning framework is proposed with the goal of optimizing completeness-weighted age of information (CAoI) in a challenging mmWave V2X scenario. Inspired by cognitive psychology, the proposed dual-mind world model encompasses a pattern-driven System 1 component and a logic-driven System 2 component to learn dynamics and logic of the wireless network, and to provide long-term link scheduling over reliable imagined trajectories. Link scheduling is learned through end-to-end differentiable imagined trajectories with logical consistency over an extended horizon rather than relying on wireless data obtained from environment interactions. Moreover, through imagination rollouts, the proposed world model can jointly reason network states and plan link scheduling. During intervals without observations, the proposed method remains capable of making efficient decisions. Extensive experiments are conducted on a realistic simulator based on Sionna with real-world physical channel, ray-tracing, and scene objects with material properties. Simulation results show that the proposed world model achieves a significant improvement in data efficiency and achieves strong generalization and adaptation to unseen environments, compared to the state-of-the-art RL baselines, and the world model approach with only System 1.