🤖 AI Summary
This study investigates how collective behaviors of reinforcement learning (RL)-driven autonomous vehicles (AVs) affect urban traffic flow in mixed-autonomy environments. We propose PARCOUR, a multi-agent RL-based simulation framework that models dynamic route choice and human-AV interactions under diverse AV behavioral strategies—including self-interested and cooperative policies. Our key contribution is the first systematic characterization of how distinct AV behavioral paradigms differentially impact individual travel efficiency versus system-wide traffic performance. Experimental results show that self-interested AVs reduce their own travel time by up to 5%—consistently outperforming human drivers—but may exacerbate delays for others. In contrast, moderately cooperative strategies improve aggregate network throughput without substantially compromising individual efficiency. These findings demonstrate the promise of multi-agent RL for traffic coordination while highlighting the inherent trade-offs in designing socially optimal AV decision-making policies.
📝 Abstract
This study examines the potential impact of reinforcement learning (RL)-enabled autonomous vehicles (AV) on urban traffic flow in a mixed traffic environment. We focus on a simplified day-to-day route choice problem in a multi-agent setting. We consider a city network where human drivers travel through their chosen routes to reach their destinations in minimum travel time. Then, we convert one-third of the population into AVs, which are RL agents employing Deep Q-learning algorithm. We define a set of optimization targets, or as we call them behaviors, namely selfish, collaborative, competitive, social, altruistic, and malicious. We impose a selected behavior on AVs through their rewards. We run our simulations using our in-house developed RL framework PARCOUR. Our simulations reveal that AVs optimize their travel times by up to 5%, with varying impacts on human drivers' travel times depending on the AV behavior. In all cases where AVs adopt a self-serving behavior, they achieve shorter travel times than human drivers. Our findings highlight the complexity differences in learning tasks of each target behavior. We demonstrate that the multi-agent RL setting is applicable for collective routing on traffic networks, though their impact on coexisting parties greatly varies with the behaviors adopted.