🤖 AI Summary
This work addresses the challenges of unsafe training and the difficulty of balancing efficiency and safety in deploying deep reinforcement learning for high-speed autonomous driving. The authors propose MoE-RM-SRL, a unified safe reinforcement learning framework that integrates safety distance (SD), reward machines (RM), and a sparsely-gated mixture-of-experts (MoE) architecture. By incorporating a rule-aware reward mechanism, the framework ensures safety during training, while minimal expert activation enables efficient decision-making. Built upon DQN, the system is validated on both the CARLA simulator and a six-degree-of-freedom driver-in-the-loop VR platform. Experiments demonstrate that MoE-RM-SRL significantly outperforms existing methods in randomized two-lane scenarios, simultaneously improving safety and traffic efficiency, and naturally extends to complex traffic situations such as multi-lane environments and on/off-ramp merging.
📝 Abstract
Deep reinforcement learning (DRL) offers a compelling route to decision-making for advanced autonomous vehicles (AVs), yet its trial-and-error nature makes it difficult to guarantee safety during training and to achieve both safety and efficiency at deployment. We propose a unified safe reinforcement learning (SRL) framework that integrates safe distance (SD), reward machines (RM), and mixture-of-experts (MoE), termed MoE-RM-SRL. For deployment, SD and RM jointly shape a rule-aware reward that encodes highway traffic regulations and stage-wise objectives, enabling safe and reliable behavior without sacrificing efficiency. For training, we introduce a sparsely gated MoE layer comprising up to 11 deep Q-networks (DQNs); an SD-based gating rule activates a minimal set of experts for lane-keeping and lane-changing, mitigating the instability, discontinuities, and impulsive transients commonly induced by switching between heterogeneous controllers (e.g., MPC/rule-based modules and learned policies). We implement the proposed architecture in CARLA and integrate it with a 6-DoF driver-in-the-loop virtual-reality (DiL-VR) platform. Experiments in stochastic two-lane traffic show that MoE-RM-SRL substantially improves safety and efficiency over state-of-the-art baselines, and the framework naturally extends to multi-lane driving as well as on-ramp merging and exiting scenarios.