🤖 AI Summary
This work addresses the limitations of existing autonomous driving testing methods, which struggle to generate high-risk, multi-vehicle traffic violation scenarios and lack systematic coverage of traffic regulations. The authors propose ROMAN, a novel framework that introduces, for the first time, a large language model–driven risk-weighted module to quantify traffic laws based on violation severity and occurrence frequency. ROMAN further employs a multi-head attention mechanism to model interactions among vehicles and traffic signals, enabling the generation of comprehensive, high-risk test scenarios in CARLA. Experiments on Baidu Apollo demonstrate that ROMAN produces 7.91% more violation scenarios than ABLE and 55.96% more than LawBreaker, with superior diversity and full coverage of all input regulatory provisions.
📝 Abstract
Automated Driving System (ADS) acts as the brain of autonomous vehicles, responsible for their safety and efficiency. Safe deployment requires thorough testing in diverse real-world scenarios and compliance with traffic laws like speed limits, signal obedience, and right-of-way rules. Violations like running red lights or speeding pose severe safety risks. However, current testing approaches face significant challenges: limited ability to generate complex and high-risk law-breaking scenarios, and failing to account for complex interactions involving multiple vehicles and critical situations. To address these challenges, we propose ROMAN, a novel scenario generation approach for ADS testing that combines a multi-head attention network with a traffic law weighting mechanism. ROMAN is designed to generate high-risk violation scenarios to enable more thorough and targeted ADS evaluation. The multi-head attention mechanism models interactions among vehicles, traffic signals, and other factors. The traffic law weighting mechanism implements a workflow that leverages an LLM-based risk weighting module to evaluate violations based on the two dimensions of severity and occurrence. We have evaluated ROMAN by testing the Baidu Apollo ADS within the CARLA simulation platform and conducting extensive experiments to measure its performance. Experimental results demonstrate that ROMAN surpassed state-of-the-art tools ABLE and LawBreaker by achieving 7.91% higher average violation count than ABLE and 55.96% higher than LawBreaker, while also maintaining greater scenario diversity. In addition, only ROMAN successfully generated violation scenarios for every clause of the input traffic laws, enabling it to identify more high-risk violations than existing approaches.