🤖 AI Summary
Adaptive traffic signal control via multi-agent reinforcement learning (MARL) faces deployment challenges in real-world corridors due to oversimplified fixed-time assumptions and predominant reliance on value-based methods, which struggle with eight-phase intersection constraints and partially observable environments.
Method: This paper proposes a cooperative control framework based on Multi-Agent Proximal Policy Optimization (MA-PPO), the first to apply MA-PPO to multi-intersection coordination under full phase constraints. It employs a centralized critic to model complex phase sequences and directly outputs deployable dynamic signal plans within the Centralized Training with Decentralized Execution (CTDE) paradigm.
Results: Validated via Vissim-MaxTime hardware-in-the-loop simulation and field tests on a seven-intersection corridor, the approach reduces bidirectional main-stream travel time by 14% and 29%, respectively, compared to actuated coordinated control (ASC), demonstrating significantly enhanced robustness and adaptability to dynamic traffic fluctuations.
📝 Abstract
The very few studies that have attempted to formulate multi-agent reinforcement learning (RL) algorithms for adaptive traffic signal control have mainly used value-based RL methods although recent literature has shown that policy-based methods may perform better in partially observable environments. Additionally, because of the simplifying assumptions on signal timing made almost universally across previous studies, RL methods remain largely untested for real-world signal timing plans. This study formulates a multi-agent proximal policy optimization (MA-PPO) algorithm to implement adaptive and coordinated traffic control along an arterial corridor. The formulated MA-PPO has centralized critic architecture under the centralized training and decentralized execution framework. All agents are formulated to allow selection and implementation of up to eight signal phases as commonly implemented in the field controllers. The formulated algorithm is tested on a simulated real-world corridor with seven intersections, actual/complete traffic movements and signal phases, traffic volumes, and network geometry including intersection spacings. The performance of the formulated MA-PPO adaptive control algorithm is compared with the field implemented coordinated and actuated signal control (ASC) plans modeled using Vissim-MaxTime software in the loop simulation (SILs). The speed of convergence for each agent largely depended on the size of the action space which in turn depended on the number and sequence of signal phases. Compared with the currently implemented ASC signal timings, MA-PPO showed a travel time reduction of about 14% and 29%, respectively for the two through movements across the entire test corridor. Through volume sensitivity experiments, the formulated MA-PPO showed good stability, robustness and adaptability to changes in traffic demand.