Autonomous Vehicles Using Multi-Agent Reinforcement Learning for Routing Decisions Can Harm Urban Traffic

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper identifies three critical challenges—convergence failure, training unreliability, and high deployment risk—arising from multi-agent reinforcement learning (MARL)-driven cooperative path planning for autonomous vehicles in mixed human-vehicle urban traffic. Leveraging traffic simulation platforms, we systematically evaluate MADDPG, QMix, and other MARL algorithms under centralized and decentralized paradigms. We empirically demonstrate, for the first time, that MARL policies consistently fail to converge to optimal solutions; human driver adaptivity exacerbates non-stationarity, increasing human travel time and CO₂ emissions. While centralized training improves convergence, it compromises mobility privacy. In response, we propose a paradigm shift: replacing pure algorithmic optimization with reality-grounded benchmarking, phased deployment, and dynamic behavioral regulation. This framework advances sustainable, equitable, and trustworthy urban intelligent transportation systems.

Technology Category

Application Category

📝 Abstract

Autonomous vehicles (AVs) using Multi-Agent Reinforcement Learning (MARL) for simultaneous route optimization may destabilize traffic environments, with human drivers possibly experiencing longer travel times. We study this interaction by simulating human drivers and AVs. Our experiments with standard MARL algorithms reveal that, even in trivial cases, policies often fail to converge to an optimal solution or require long training periods. The problem is amplified by the fact that we cannot rely entirely on simulated training, as there are no accurate models of human routing behavior. At the same time, real-world training in cities risks destabilizing urban traffic systems, increasing externalities, such as $CO_2$ emissions, and introducing non-stationarity as human drivers adapt unpredictably to AV behaviors. Centralization can improve convergence in some cases, however, it raises privacy concerns for the travelers' destination data. In this position paper, we argue that future research must prioritize realistic benchmarks, cautious deployment strategies, and tools for monitoring and regulating AV routing behaviors to ensure sustainable and equitable urban mobility systems.

Problem

Research questions and friction points this paper is trying to address.

MARL in AVs disrupts urban traffic stability.

Simulated training lacks accurate human behavior models.

Real-world AV training risks increasing CO2 emissions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning

Simulated Human and AV Interaction

Centralization with Privacy Concerns

🔎 Similar Papers

No similar papers found.

Authors to Follow