🤖 AI Summary
To address the real-time and safety challenges posed by unpredictable human driving behaviors in dynamic traffic scenarios—such as multi-lane highways and unsignalized T-junctions—this paper proposes a two-layer interactive decision-making architecture. The upper layer employs interactive Monte Carlo Tree Search (iMCTS) for rational, online game-theoretic reasoning, while the lower layer integrates value and policy networks trained collaboratively via PPO, SAC, and A2C, tightly coupled with dynamic trajectory planning and closed-loop control. Evaluated in CARLA, the architecture reduces computational overhead significantly, improves interactive inference efficiency by 32%, decreases collision rate by 47%, and enhances traffic throughput by 21%, outperforming state-of-the-art methods in both safety and interaction plausibility. The core innovation lies in the tight coupling of iMCTS with heterogeneous deep reinforcement learning algorithms, achieving, for the first time, joint optimization of high real-time performance and robustness.
📝 Abstract
In complex real-world traffic environments, autonomous vehicles (AVs) need to interact with other traffic participants while making real-time and safety-critical decisions accordingly. The unpredictability of human behaviors poses significant challenges, particularly in dynamic scenarios, such as multi-lane highways and unsignalized T-intersections. To address this gap, we design a bi-level interaction decision-making algorithm (BIDA) that integrates interactive Monte Carlo tree search (MCTS) with deep reinforcement learning (DRL), aiming to enhance interaction rationality, efficiency and safety of AVs in dynamic key traffic scenarios. Specifically, we adopt three types of DRL algorithms to construct a reliable value network and policy network, which guide the online deduction process of interactive MCTS by assisting in value update and node selection. Then, a dynamic trajectory planner and a trajectory tracking controller are designed and implemented in CARLA to ensure smooth execution of planned maneuvers. Experimental evaluations demonstrate that our BIDA not only enhances interactive deduction and reduces computational costs, but also outperforms other latest benchmarks, which exhibits superior safety, efficiency and interaction rationality under varying traffic conditions.