🤖 AI Summary
This paper identifies a novel instability phenomenon in multimodal large language models (MLLMs): under misleading prompts, MLLMs reverse their initially correct answers in 65% of cases—a critical vulnerability overlooked by prior work on robustness. Method: To systematically evaluate resistance to misdirection, we propose a two-stage misleading-response contrastive paradigm and introduce MUB, the first multimodal uncertainty benchmark covering diverse domains and incorporating both explicit and implicit misleading cues. We design two quantitative metrics—misleading rate and response shift—to assess answer consistency. Leveraging misleading instruction engineering, response consistency analysis, two-stage sampling, and fine-tuning with injected misleading data, we enhance model robustness. Contribution/Results: Our approach significantly reduces misleading rates: state-of-the-art MLLMs exhibit >86% misleading rates baseline, which drop substantially post-fine-tuning. The MUB benchmark and code are publicly released.
📝 Abstract
Ensuring that Multimodal Large Language Models (MLLMs) maintain consistency in their responses is essential for developing trustworthy multimodal intelligence. However, existing benchmarks include many samples where all MLLMs extit{exhibit high response uncertainty when encountering misleading information}, requiring even 5-15 response attempts per sample to effectively assess uncertainty. Therefore, we propose a two-stage pipeline: first, we collect MLLMs' responses without misleading information, and then gather misleading ones via specific misleading instructions. By calculating the misleading rate, and capturing both correct-to-incorrect and incorrect-to-correct shifts between the two sets of responses, we can effectively metric the model's response uncertainty. Eventually, we establish a extbf{underline{M}}ultimodal extbf{underline{U}}ncertainty extbf{underline{B}}enchmark ( extbf{MUB}) that employs both explicit and implicit misleading instructions to comprehensively assess the vulnerability of MLLMs across diverse domains. Our experiments reveal that all open-source and close-source MLLMs are highly susceptible to misleading instructions, with an average misleading rate exceeding 86%. To enhance the robustness of MLLMs, we further fine-tune all open-source MLLMs by incorporating explicit and implicit misleading data, which demonstrates a significant reduction in misleading rates. Our code is available at: href{https://github.com/Yunkai696/MUB}{https://github.com/Yunkai696/MUB}