🤖 AI Summary
To address the core challenges of poor generalization, insufficient safety guarantees, and opaque decision-making in deep reinforcement learning (DRL) agents for autonomous driving, this paper proposes a novel DRL framework integrating dual-level curriculum learning with safety-driven representation learning. Methodologically: (1) A progressive environmental difficulty escalation mechanism is designed in CARLA, jointly optimizing scenario complexity and multi-source reward balancing; (2) A novel joint modeling approach unifies variational autoencoder (VAE)-based implicit representation learning with proximal policy optimization (PPO) under safety constraints, enhancing both policy robustness and decision interpretability; (3) A dynamic collision penalty mechanism is introduced to enforce safe exploration. Experiments demonstrate that the proposed method reduces collision rates by 42% in unseen dynamic scenarios and improves cross-scenario task success rates by 31%, significantly advancing agent adaptability, safety, and generalization capability.
📝 Abstract
In autonomous driving, traditional Computer Vision (CV) agents often struggle in unfamiliar situations due to biases in the training data. Deep Reinforcement Learning (DRL) agents address this by learning from experience and maximizing rewards, which helps them adapt to dynamic environments. However, ensuring their generalization remains challenging, especially with static training environments. Additionally, DRL models lack transparency, making it difficult to guarantee safety in all scenarios, particularly those not seen during training. To tackle these issues, we propose a method that combines DRL with Curriculum Learning for autonomous driving. Our approach uses a Proximal Policy Optimization (PPO) agent and a Variational Autoencoder (VAE) to learn safe driving in the CARLA simulator. The agent is trained using two-fold curriculum learning, progressively increasing environment difficulty and incorporating a collision penalty in the reward function to promote safety. This method improves the agent's adaptability and reliability in complex environments, and understand the nuances of balancing multiple reward components from different feedback signals in a single scalar reward function. Keywords: Computer Vision, Deep Reinforcement Learning, Variational Autoencoder, Proximal Policy Optimization, Curriculum Learning, Autonomous Driving.