Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL) routing games for autonomous vehicles (AVs), selfish agent decisions impede system convergence, induce oscillations, and compromise traffic stability. Method: This paper proposes a socially aware cooperative optimization mechanism centered on an intrinsic reward signal derived from a counterfactual marginal cost matrix, explicitly modeling how individual actions affect system-level equilibrium while balancing global efficiency and individual rationality. This non-selfish reward preserves Nash equilibrium properties while suppressing policy oscillations. Contribution/Results: The mechanism significantly improves training stability and convergence speed. Experiments on both synthetic toy networks and the real-world St. Arnold road network demonstrate stable convergence under the proposed method, whereas mainstream MARL baselines fail entirely. This work establishes a provably stable, distributed learning paradigm for large-scale AV cooperative routing.

Technology Category

Application Category

📝 Abstract
Previous work has shown that when multiple selfish Autonomous Vehicles (AVs) are introduced to future cities and start learning optimal routing strategies using Multi-Agent Reinforcement Learning (MARL), they may destabilize traffic systems, as they would require a significant amount of time to converge to the optimal solution, equivalent to years of real-world commuting. We demonstrate that moving beyond the selfish component in the reward significantly relieves this issue. If each AV, apart from minimizing its own travel time, aims to reduce its impact on the system, this will be beneficial not only for the system-wide performance but also for each individual player in this routing game. By introducing an intrinsic reward signal based on the marginal cost matrix, we significantly reduce training time and achieve convergence more reliably. Marginal cost quantifies the impact of each individual action (route-choice) on the system (total travel time). Including it as one of the components of the reward can reduce the degree of non-stationarity by aligning agents' objectives. Notably, the proposed counterfactual formulation preserves the system's equilibria and avoids oscillations. Our experiments show that training MARL algorithms with our novel reward formulation enables the agents to converge to the optimal solution, whereas the baseline algorithms fail to do so. We show these effects in both a toy network and the real-world network of Saint-Arnoult. Our results optimistically indicate that social awareness (i.e., including marginal costs in routing decisions) improves both the system-wide and individual performance of future urban systems with AVs.
Problem

Research questions and friction points this paper is trying to address.

AVs destabilize traffic in multi-agent reinforcement learning routing games
Selfish reward functions cause slow convergence and system instability
Social awareness through marginal cost improves system and individual performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intrinsic reward based on marginal cost matrix
Counterfactual formulation preserves system equilibria
Social awareness improves system-wide and individual performance
🔎 Similar Papers
No similar papers found.