Safe Reinforcement Learning of Autonomous Highway Driving: A Unified Framework for Safety and Efficiency

📅 2026-06-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of unsafe training and the difficulty of balancing efficiency and safety in deploying deep reinforcement learning for high-speed autonomous driving. The authors propose MoE-RM-SRL, a unified safe reinforcement learning framework that integrates safety distance (SD), reward machines (RM), and a sparsely-gated mixture-of-experts (MoE) architecture. By incorporating a rule-aware reward mechanism, the framework ensures safety during training, while minimal expert activation enables efficient decision-making. Built upon DQN, the system is validated on both the CARLA simulator and a six-degree-of-freedom driver-in-the-loop VR platform. Experiments demonstrate that MoE-RM-SRL significantly outperforms existing methods in randomized two-lane scenarios, simultaneously improving safety and traffic efficiency, and naturally extends to complex traffic situations such as multi-lane environments and on/off-ramp merging.

📝 Abstract

Deep reinforcement learning (DRL) offers a compelling route to decision-making for advanced autonomous vehicles (AVs), yet its trial-and-error nature makes it difficult to guarantee safety during training and to achieve both safety and efficiency at deployment. We propose a unified safe reinforcement learning (SRL) framework that integrates safe distance (SD), reward machines (RM), and mixture-of-experts (MoE), termed MoE-RM-SRL. For deployment, SD and RM jointly shape a rule-aware reward that encodes highway traffic regulations and stage-wise objectives, enabling safe and reliable behavior without sacrificing efficiency. For training, we introduce a sparsely gated MoE layer comprising up to 11 deep Q-networks (DQNs); an SD-based gating rule activates a minimal set of experts for lane-keeping and lane-changing, mitigating the instability, discontinuities, and impulsive transients commonly induced by switching between heterogeneous controllers (e.g., MPC/rule-based modules and learned policies). We implement the proposed architecture in CARLA and integrate it with a 6-DoF driver-in-the-loop virtual-reality (DiL-VR) platform. Experiments in stochastic two-lane traffic show that MoE-RM-SRL substantially improves safety and efficiency over state-of-the-art baselines, and the framework naturally extends to multi-lane driving as well as on-ramp merging and exiting scenarios.

Problem

Research questions and friction points this paper is trying to address.

Safe Reinforcement Learning

Autonomous Driving

Safety

Efficiency

Highway Driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Safe Reinforcement Learning

Reward Machines

Mixture-of-Experts