🤖 AI Summary
To address the bandwidth reservation challenge for latency-sensitive vehicular applications in multi-operator vehicular networks—characterized by highly volatile bandwidth availability, uncertain pricing, and stringent low-latency and high-reliability requirements for safety-critical services—this paper proposes a regionalized modeling framework coupled with an adaptive composite Markov Decision Process (MDP). The framework integrates Temporal Fusion Transformer (TFT)-based time-series modeling with Dueling Deep Q-Network (Dueling DQN) reinforcement learning. Leveraging real-synthetic hybrid data, it employs multi-stage transfer learning to enhance policy generalization across heterogeneous network regions. Experimental results demonstrate that the proposed method achieves up to 40% reduction in bandwidth cost while strictly satisfying end-to-end latency constraints and ensuring access fairness. It significantly outperforms baseline strategies, effectively balancing real-time performance, economic efficiency, and cross-operator resource allocation fairness.
📝 Abstract
Onsite bandwidth reservation requests often face challenges such as price fluctuations and fairness issues due to unpredictable bandwidth availability and stringent latency requirements. Requesting bandwidth in advance can mitigate the impact of these fluctuations and ensure timely access to critical resources. In a multi-Mobile Network Operator (MNO) environment, vehicles need to select cost-effective and reliable resources for their safety-critical applications. This research aims to minimize resource costs by finding the best price among multiple MNOs. It formulates multi-operator scenarios as a Markov Decision Process (MDP), utilizing a Deep Reinforcement Learning (DRL) algorithm, specifically Dueling Deep Q-Learning. For efficient and stable learning, we propose a novel area-wise approach and an adaptive MDP synthetic close to the real environment. The Temporal Fusion Transformer (TFT) is used to handle time-dependent data and model training. Furthermore, the research leverages Amazon spot price data and adopts a multi-phase training approach, involving initial training on synthetic data, followed by real-world data. These phases enable the DRL agent to make informed decisions using insights from historical data and real-time observations. The results show that our model leads to significant cost reductions, up to 40%, compared to scenarios without a policy model in such a complex environment.