Generalized Multi-Objective Reinforcement Learning with Envelope Updates in URLLC-enabled Vehicular Networks

📅 2024-05-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the joint optimization of multi-band (sub-6 GHz/THz) network selection and autonomous driving decisions for ultra-reliable low-latency communication (URLLC)-enhanced vehicular networks, aiming to co-optimize traffic efficiency, driving safety, communication reliability, and latency. We propose a novel convex-hull-based generalized multi-objective reinforcement learning (MORL) framework, enabling unified policy representation under unknown preference profiles via envelope updates—supporting zero-shot preference adaptation and few-shot inference. The framework integrates multi-objective Markov decision process (MDP) modeling, generalized Bellman equations, multi-band radio resource scheduling, and vehicle dynamics control. Experiments demonstrate a 37% reduction in collision rate, a 42% decrease in handover frequency, and significant improvements in communication reliability and data rate. Crucially, we provide the first quantitative validation of the strong coupling among motion control, handover behavior, and communication performance—achieving deep synergy between safe driving and high connectivity.

Technology Category

Application Category

📝 Abstract
We develop a novel multi-objective reinforcement learning (MORL) framework to jointly optimize wireless network selection and autonomous driving policies in a multi-band vehicular network operating on conventional sub-6GHz spectrum and Terahertz frequencies. The proposed framework is designed to 1. maximize the traffic flow and 2. minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration), and enhance the ultra-reliable low-latency communication (URLLC) while minimizing handoffs (HOs). We cast this problem as a multi-objective Markov Decision Process (MOMDP) and develop solutions for both predefined and unknown preferences of the conflicting objectives. Specifically, deep-Q-network and double deep-Q-network-based solutions are developed first that consider scalarizing the transportation and telecommunication rewards using predefined preferences. We then develop a novel envelope MORL solution which develop policies that address multiple objectives with unknown preferences to the agent. While this approach reduces reliance on scalar rewards, policy effectiveness varying with different preferences is a challenge. To address this, we apply a generalized version of the Bellman equation and optimize the convex envelope of multi-objective Q values to learn a unified parametric representation capable of generating optimal policies across all possible preference configurations. Following an initial learning phase, our agent can execute optimal policies under any specified preference or infer preferences from minimal data samples.Numerical results validate the efficacy of the envelope-based MORL solution and demonstrate interesting insights related to the inter-dependency of vehicle motion dynamics, HOs, and the communication data rate. The proposed policies enable autonomous vehicles to adopt safe driving behaviors with improved connectivity.
Problem

Research questions and friction points this paper is trying to address.

Autonomous Vehicle Networks
Optimization
Safety and Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Envelope Update Method
Multi-objective Learning
Autonomous Driving Optimization
🔎 Similar Papers
No similar papers found.