Multi-modal Traffic Scenario Generation for Autonomous Driving System Testing

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address low efficiency and insufficient fidelity in autonomous driving system (ADS) testing scenario generation, this paper proposes TrafficComposer—the first high-fidelity traffic scenario generation method supporting dual-modal inputs: natural language descriptions and traffic scene images. Its core innovation is a language–image cross-modal collaborative modeling framework, integrating multimodal encoders, a cross-modal alignment module, and a scene decoder to jointly achieve dynamic semantic understanding and geometric detail reconstruction. TrafficComposer is compatible with mainstream simulation platforms including CARLA and LGSVL. Evaluated on a 120-scenario benchmark, it achieves 97.0% generation accuracy—7.3 percentage points higher than state-of-the-art baselines—and directly uncovers 37 ADS defects. When used as seeds for fuzz testing, it improves vulnerability detection rates by 33%–124%.

Technology Category

Application Category

📝 Abstract

Autonomous driving systems (ADS) require extensive testing and validation before deployment. However, it is tedious and time-consuming to construct traffic scenarios for ADS testing. In this paper, we propose TrafficComposer, a multi-modal traffic scenario construction approach for ADS testing. TrafficComposer takes as input a natural language (NL) description of a desired traffic scenario and a complementary traffic scene image. Then, it generates the corresponding traffic scenario in a simulator, such as CARLA and LGSVL. Specifically, TrafficComposer integrates high-level dynamic information about the traffic scenario from the NL description and intricate details about the surrounding vehicles, pedestrians, and the road network from the image. The information from the two modalities is complementary to each other and helps generate high-quality traffic scenarios for ADS testing. On a benchmark of 120 traffic scenarios, TrafficComposer achieves 97.0% accuracy, outperforming the best-performing baseline by 7.3%. Both direct testing and fuzz testing experiments on six ADSs prove the bug detection capabilities of the traffic scenarios generated by TrafficComposer. These scenarios can directly discover 37 bugs and help two fuzzing methods find 33%--124% more bugs serving as initial seeds.

Problem

Research questions and friction points this paper is trying to address.

Generates multi-modal traffic scenarios for autonomous driving testing

Combines natural language descriptions and images for scenario construction

Improves bug detection in autonomous systems through high-quality scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal input for traffic scenario generation

Natural language and image integration

High accuracy in scenario construction

🔎 Similar Papers

No similar papers found.