Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Efficiently generating critical scenarios that simultaneously violate multiple safety and functional requirements remains a key challenge in autonomous driving system reliability verification. Method: This paper proposes MOEQT, a novel framework that introduces dynamically weighted Envelope Q-learning into multi-objective reinforcement learning (MORL) to adaptively balance the importance of heterogeneous requirements. MOEQT operates in closed-loop interaction with a high-fidelity simulation environment and integrates an end-to-end autonomous driving controller for realistic testing. Contribution/Results: Experiments demonstrate that MOEQT significantly improves both the detection rate and diversity of multi-requirement violation scenarios. It outperforms random sampling and single-objective RL baselines in scenario coverage, failure-triggering capability, and cross-objective synergy. MOEQT establishes a scalable, multi-dimensional reliability verification paradigm for autonomous driving systems.

Technology Category

Application Category

📝 Abstract

Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios, but the unlimited number of potential scenarios highlights the importance of identifying critical scenarios that can violate safety or functional requirements. Such requirements are inherently interdependent and need to be tested simultaneously. To this end, we propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements. MOEQT adapts Envelope Q-learning as the MORL algorithm, which dynamically adapts multi-objective weights to balance the relative importance between multiple objectives. MOEQT generates critical scenarios to violate multiple requirements through dynamically interacting with the AV environment, ensuring comprehensive AV testing. We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator and compare MOEQT with two baselines: a random strategy and a single-objective RL with a weighted reward function. Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.

Problem

Research questions and friction points this paper is trying to address.

Identify critical scenarios for autonomous vehicles

Test interdependent safety and functional requirements

Generate scenarios using multi-objective reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Reinforcement Learning

Envelope Q-learning Algorithm

Dynamic Multi-Objective Weighting

🔎 Similar Papers

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving