🤖 AI Summary
To address the challenge of safely learning autonomous driving policies in real-world road environments, this paper proposes DRLSL, a neuro-symbolic, model-free deep reinforcement learning framework. DRLSL innovatively embeds first-order logic safety constraints into the DRL training loop, integrating Proximal Policy Optimization (PPO), high-fidelity trajectory modeling trained on the highD dataset, and a real-time safety-aware action masking mechanism. This design enables concurrent experience-driven policy optimization and formal safety verification during physical vehicle–environment interactions. Experimental results demonstrate zero unsafe actions throughout both training and testing phases; a 37% acceleration in convergence speed; and a 22% improvement in cross-scenario generalization accuracy over baseline DRL methods. By bridging the gap between empirical learning and formal safety guarantees, DRLSL overcomes a critical safety bottleneck hindering real-world deployment of DRL-based autonomous driving systems, establishing a new paradigm for verifiable and trustworthy online policy learning.
📝 Abstract
The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.