Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenge of ultra-low end-to-end latency (<100 ms) and high reliability in remote driving, this paper proposes a dynamic wireless resource scheduling method based on multi-agent reinforcement learning (MARL). Specifically, it integrates the centralized-training-with-decentralized-execution MAPPO framework with a greedy resource allocation strategy to jointly optimize RAN-layer parameters—ensuring high-fidelity video and control data transmission while minimizing latency. ns-3 simulations demonstrate that the proposed MAPPO+greedy approach reduces end-to-end latency by 37% compared to baseline methods, maintains stable performance under increasing vehicle density, and significantly outperforms IPPO, highlighting the advantages of centralized training for multi-vehicle coordination. This work establishes a scalable and deployable paradigm for intelligent wireless resource management under stringent QoS requirements.

Technology Category

Application Category

📝 Abstract
The teleoperated driving (TD) scenario comes with stringent Quality of Service (QoS) communication constraints, especially in terms of end-to-end (E2E) latency and reliability. In this context, Predictive Quality of Service (PQoS), possibly combined with Reinforcement Learning (RL) techniques, is a powerful tool to estimate QoS degradation and react accordingly. For example, an intelligent agent can be trained to select the optimal compression configuration for automotive data, and reduce the file size whenever QoS conditions deteriorate. However, compression may inevitably compromise data quality, with negative implications for the TD application. An alternative strategy involves operating at the Radio Access Network (RAN) level to optimize radio parameters based on current network conditions, while preserving data quality. In this paper, we propose Multi-Agent Reinforcement Learning (MARL) scheduling algorithms, based on Proximal Policy Optimization (PPO), to dynamically and intelligently allocate radio resources to minimize E2E latency in a TD scenario. We evaluate two training paradigms, i.e., decentralized learning with local observations (IPPO) vs. centralized aggregation (MAPPO), in conjunction with two resource allocation strategies, i.e., proportional allocation (PA) and greedy allocation (GA). We prove via ns-3 simulations that MAPPO, combined with GA, achieves the best results in terms of latency, especially as the number of vehicles increases.
Problem

Research questions and friction points this paper is trying to address.

Minimizing end-to-end latency in teleoperated driving using MARL
Optimizing radio resource allocation to maintain QoS in TD
Comparing decentralized vs centralized MARL for latency reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning for latency minimization
Proximal Policy Optimization for dynamic resource allocation
Centralized aggregation with greedy allocation strategy
🔎 Similar Papers
No similar papers found.
G
Giacomo Avanzi
Department of Information Engineering, University of Padova, Italy.
M
M. Giordani
Department of Information Engineering, University of Padova, Italy.
Michele Zorzi
Michele Zorzi
Dept. of Information Engineering - University of Padova, Italy
electrical engineeringnetworkingwireless communicationswireless networks