BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Vision-language models (VLMs) suffer from hallucination in autonomous driving perception—i.e., missing real traffic participants (e.g., pedestrians, cyclists) or generating false positives—thereby compromising ADAS/ADS decision safety. To address this, we propose BetterCheck, a traffic-scene-specific VLM hallucination detection and mitigation framework that systematically distinguishes false alarms from missed detections via fine-grained image analysis and cross-modal semantic consistency verification. Extensive experiments on the Waymo Open Dataset reveal that state-of-the-art VLMs exhibit pervasive hallucination across diverse traffic scenarios. BetterCheck achieves high-precision hallucination identification with low false-negative rates and enables integration with downstream safety-critical modules. Results demonstrate substantial improvements in both reliability and robustness of VLM-based perception under complex, dynamic traffic conditions, advancing trustworthy multimodal understanding for autonomous driving systems.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are growingly extended to process multimodal data such as text and video simultaneously. Their remarkable performance in understanding what is shown in images is surpassing specialized neural networks (NNs) such as Yolo that is supporting only a well-formed but very limited vocabulary, ie., objects that they are able to detect. When being non-restricted, LLMs and in particular state-of-the-art vision language models (VLMs) show impressive performance to describe even complex traffic situations. This is making them potentially suitable components for automotive perception systems to support the understanding of complex traffic situations or edge case situation. However, LLMs and VLMs are prone to hallucination, which mean to either potentially not seeing traffic agents such as vulnerable road users who are present in a situation, or to seeing traffic agents who are not there in reality. While the latter is unwanted making an ADAS or autonomous driving systems (ADS) to unnecessarily slow down, the former could lead to disastrous decisions from an ADS. In our work, we are systematically assessing the performance of 3 state-of-the-art VLMs on a diverse subset of traffic situations sampled from the Waymo Open Dataset to support safety guardrails for capturing such hallucinations in VLM-supported perception systems. We observe that both, proprietary and open VLMs exhibit remarkable image understanding capabilities even paying thorough attention to fine details sometimes difficult to spot for us humans. However, they are also still prone to making up elements in their descriptions to date requiring hallucination detection strategies such as BetterCheck that we propose in our work.

Problem

Research questions and friction points this paper is trying to address.

Assessing VLMs for hallucination in automotive perception

Evaluating VLMs on diverse traffic situations for safety

Proposing BetterCheck to detect VLM hallucinations effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically assessing VLMs on traffic situations

Proposing BetterCheck for hallucination detection

Evaluating VLMs using Waymo Open Dataset samples

🔎 Similar Papers

No similar papers found.

Authors to Follow