🤖 AI Summary
This work addresses the challenges of user association in millimeter-wave vehicular networks, where dynamic blockages and dense base station deployments lead to highly volatile link quality and complex association decisions, rendering conventional multi-armed bandit approaches ineffective in non-stationary environments. The paper formulates user association as a non-stationary contextual bandit problem and proposes a fully distributed mobility management framework that operates without channel state information. By integrating CUSUM-based change detection to dynamically prune the set of active base stations and incorporating an active blockage-aware mechanism to mitigate transient signal degradation, the method efficiently tracks abrupt shifts in reward distributions. Compared to a hypercube contextual bandit baseline, the proposed scheme reduces regret by over 40% and improves network throughput by up to 33.1%, demonstrating robust performance across varying blockage rates and network configurations.
📝 Abstract
In millimeter-wave (mmWave) vehicular networks, dense base station (BS) deployments expand the user association (UA) decision space while dynamic blockages cause link quality fluctuations, posing critical challenges for effective mobility management. Traditional Multi-Armed Bandit (MAB) frameworks assume stationary reward distributions and fail to handle the rapid context-reward mapping shifts caused by vehicle mobility and transient blockages. To address this, we propose Blockage-Aware Non-stationary Dynamic Bandit (BAND), a fully distributed, channel state information (CSI)-free mobility management framework for mmWave vehicular networks, formulating UA as a non-stationary contextual bandit problem, enabling online adaptive optimization without requiring central coordination or offline training. BAND employs a cumulative sum-based change detection (CUSUM-CD) to dynamically narrow the active BS set, reducing exploration overhead while tracking reward distribution shifts. Proactive blockage detection suppresses transient signal degradation in the reward estimation process. Simulations demonstrate over 40% regret reduction and up to 33.1% network communication rate improvement compared with hypercube-based contextual bandit baselines, with robustness validated across varying blockage rates and network configurations.