🤖 AI Summary
This work addresses the safety risks—such as collisions and lane departures—associated with exploration in reinforcement learning for autonomous driving. To mitigate these issues, the authors propose an expert-guided framework that integrates uncertainty awareness with a time-regulated intervention mechanism. Expert advice is adaptively triggered via a rolling buffer with a dynamic threshold, while a commitment-cooldown strategy combined with a stochastic early-stopping heuristic effectively balances exploration safety and agent autonomy. The approach employs an off-policy Implicit Quantile Network (IQN) architecture, storing both expert and agent experiences in a shared replay buffer. Evaluated in the CARLA simulation environment, the method achieves a 5–7% improvement in task success rate over the IQN baseline and significantly reduces failure rates.
📝 Abstract
Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel behaviors to learn, yet exploration can lead to collisions or off-road driving. We propose an uncertainty-aware framework that leverages expert advice to guide exploration while avoiding long-term dependence. Advice is triggered when epistemic or aleatoric uncertainty exceeds adaptive thresholds derived from rolling buffers, ensuring advice evolves with the agent's confidence. A commitment-cooldown strategy with a stochastic early-stop heuristic regulates the duration and frequency of guidance, exposing the agent to coherent maneuvers without exhausting the advice budget. Expert and agent experiences are combined in a shared replay buffer within an off-policy implicit quantile network (IQN) backbone, enabling efficient reuse of expert trajectories. Experiments in CARLA show that our method outperforms the IQN baseline, improving success by 5-7% and reducing failures, demonstrating that risk-sensitive uncertainty coupled with regulated expert integration enables safer and more efficient exploration for sensor-based RL policy learning in unsignalized intersection navigation.