🤖 AI Summary
In multi-vehicle cooperative autonomous driving, multi-agent reinforcement learning (MARL) suffers from low sample efficiency and difficulty in designing reward functions that jointly optimize traffic efficiency, safety, and action rationality. To address this, we propose a gradient-aware differentiated reward method grounded in a steady-state transition system. This is the first work to incorporate state-transition gradients into MARL reward shaping, integrating both traffic-flow steady-state characteristics and dynamic evolution information to jointly guide efficiency, safety, and rationality. The method is compatible with mainstream MARL frameworks—including MAPPO, MADQN, and QMIX—and exhibits strong environmental adaptability and scalability. Experiments across varying autonomous vehicle penetration rates demonstrate significant improvements over centralized-reward baselines: 18.7% higher throughput, 32.4% lower collision rate, and 26.1% better action rationality, alongside markedly accelerated convergence.
📝 Abstract
Reinforcement learning (RL) shows great potential for optimizing multi-vehicle cooperative driving strategies through the state-action-reward feedback loop, but it still faces challenges such as low sample efficiency. This paper proposes a differentiated reward method based on steady-state transition systems, which incorporates state transition gradient information into the reward design by analyzing traffic flow characteristics, aiming to optimize action selection and policy learning in multi-vehicle cooperative decision-making. The performance of the proposed method is validated in RL algorithms such as MAPPO, MADQN, and QMIX under varying autonomous vehicle penetration. The results show that the differentiated reward method significantly accelerates training convergence and outperforms centering reward and others in terms of traffic efficiency, safety, and action rationality. Additionally, the method demonstrates strong scalability and environmental adaptability, providing a novel approach for multi-agent cooperative decision-making in complex traffic scenarios.