DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Autonomous driving systems struggle to address perception-related SOTIF (Safety of the Intended Functionality) risks in complex scenarios due to insufficient spatial-causal reasoning capabilities. To bridge this gap, we introduce the first SOTIF-oriented, multimodal dataset, coupled with a fine-grained multimodal large language model (MLLM) alignment paradigm and a generalizable SOTIF scenario generation–annotation pipeline. Our method integrates 3D spatiotemporal causal modeling, SOTIF-driven synthetic data generation, and adversarial augmentation. We fine-tune models based on Qwen-VL and LLaVA architectures, achieving significant gains in high-risk scenario detection accuracy on benchmark evaluations. Real-vehicle validation demonstrates precise identification and robust response to edge cases—even those frequently misjudged by human drivers—while maintaining end-to-end latency under 120 ms, satisfying onboard near-real-time deployment requirements.

Technology Category

Application Category

📝 Abstract

Human drivers naturally possess the ability to perceive driving scenarios, predict potential hazards, and react instinctively due to their spatial and causal intelligence, which allows them to perceive, understand, predict, and interact with the 3D world both spatially and temporally. Autonomous vehicles, however, lack these capabilities, leading to challenges in effectively managing perception-related Safety of the Intended Functionality (SOTIF) risks, particularly in complex and unpredictable driving conditions. To address this gap, we propose an approach that fine-tunes multimodal language models (MLLMs) on a customized dataset specifically designed to capture perception-related SOTIF scenarios. Model benchmarking demonstrates that this tailored dataset enables the models to better understand and respond to these complex driving situations. Additionally, in real-world case studies, the proposed method correctly handles challenging scenarios that even human drivers may find difficult. Real-time performance tests further indicate the potential for the models to operate efficiently in live driving environments. This approach, along with the dataset generation pipeline, shows significant promise for improving the identification, cognition, prediction, and reaction to SOTIF-related risks in autonomous driving systems. The dataset and information are available: https://github.com/s95huang/DriveSOTIF.git

Problem

Research questions and friction points this paper is trying to address.

Enhancing autonomous vehicle perception SOTIF risks using MLLMs

Addressing complex driving scenarios with tailored dataset training

Improving real-time SOTIF risk prediction and reaction in AVs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes MLLMs on custom SOTIF dataset

Enhances understanding of complex driving scenarios

Demonstrates real-time efficiency in live driving

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs