Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing vision-language models (VLMs)—including LLaVA, MiniGPT-4, and Qwen-VL—exhibit poor uncertainty calibration under image degradation (e.g., noise, blur, occlusion), leading to overconfident and unreliable predictions. Method: We propose a natural-language-generation-based interpretable uncertainty quantification framework that jointly leverages CLIP-ViT and large language models (LLMs). It integrates confidence analysis, expected calibration error (ECE) measurement, and text-based uncertainty reporting for multi-dimensional assessment. Contribution/Results: Empirical evaluation reveals that ECE increases by 37% on average across all models under degradation, with over 92% of incorrect answers assigned high confidence. Our method significantly improves both the interpretability and calibration accuracy of uncertainty estimates, establishing a novel paradigm for deploying trustworthy VLMs in real-world, imperfect visual conditions.

Technology Category

Application Category

📝 Abstract

To leverage the full potential of Large Language Models (LLMs) it is crucial to have some information on their answers' uncertainty. This means that the model has to be able to quantify how certain it is in the correctness of a given response. Bad uncertainty estimates can lead to overconfident wrong answers undermining trust in these models. Quite a lot of research has been done on language models that work with text inputs and provide text outputs. Still, since the visual capabilities have been added to these models recently, there has not been much progress on the uncertainty of Visual Language Models (VLMs). We tested three state-of-the-art VLMs on corrupted image data. We found that the severity of the corruption negatively impacted the models' ability to estimate their uncertainty and the models also showed overconfidence in most of the experiments.

Problem

Research questions and friction points this paper is trying to address.

Assessing uncertainty in Vision-Language Models on corrupted images

Evaluating overconfidence in VLMs under image corruption

Improving robustness of verbalized uncertainty estimation in VLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLMs tested on corrupted image data

Uncertainty estimation in visual-language models

Overconfidence analysis in VLMs

🔎 Similar Papers

No similar papers found.