🤖 AI Summary
Standardized, realistic city-scale benchmarks for evaluating reinforcement learning (RL) in collective routing of connected and autonomous vehicles (CAVs) are currently lacking. This paper introduces the first large-scale, real-world multi-agent RL (MARL) routing benchmark for urban road networks, encompassing 29 real-world traffic topologies and dynamic origin-destination demand patterns. We propose a standardized, scalable evaluation framework integrating predefined tasks, four state-of-the-art MARL algorithms (e.g., MAPPO, QMix), three classes of baselines, and domain-specific metrics—including the first public MARL urban routing leaderboard. Leveraging SUMO-based simulation and empirically grounded network modeling, our experiments reveal that current SOTA methods still underperform human-engineered routing heuristics in realistic city settings, exposing critical scalability limitations. These findings provide concrete guidance for advancing distributed, cooperative routing algorithms in urban CAV systems.
📝 Abstract
Connected Autonomous Vehicles (CAVs) promise to reduce congestion in future urban networks, potentially by optimizing their routing decisions. Unlike for human drivers, these decisions can be made with collective, data-driven policies, developed by machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing. To that end, we present our{}: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles. our{} is a comprehensive benchmarking environment that unifies evaluation across 29 real-world traffic networks paired with realistic demand patterns. our{} comes with a catalog of predefined tasks, four state-of-the-art multi-agent RL (MARL) algorithm implementations, three baseline methods, domain-specific performance metrics, and a modular configuration scheme. Our results suggest that, despite the lengthy and costly training, state-of-the-art MARL algorithms rarely outperformed humans. Experimental results reported in this paper initiate the first leaderboard for MARL in large-scale urban routing optimization and reveal that current approaches struggle to scale, emphasizing the urgent need for advancements in this domain.