When Correct Decisions Hide Internal Stress: Decision-State Probing in Multimodal Language Models

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitation of existing evaluations that rely solely on external behavior and fail to reveal the internal stability of multimodal language models under semantic stress. We propose the S³E framework, which employs a positively anchored A/B forced-choice paradigm to contrast hidden-state shifts between semantically stressed items and semantically preserved controls when correct decisions are made. Introducing the “decision-state stress sensitivity” metric—combined with layer-specific displacement analysis—we quantify, for the first time, the internal representational instability masked by correct behavioral outputs. Experiments on Qwen3-VL, Gemma3, and InternVL3 demonstrate that semantic stress induces significant decision-state shifts beyond the lexical level, showing that correct outputs do not imply geometric invariance in internal decision representations, thereby transcending the constraints of conventional behavioral evaluation.

📝 Abstract

Multimodal language models are typically evaluated through external behavior: selecting the correct image--text match, rejecting unsupported captions, or answering visual queries correctly. However, correct behavior alone does not show that the model's internal decision state remains stable under controlled semantic stress. We study this gap through S$^3$E (Structured Semantic Stress Evaluation), a framework for analyzing behavior-internal decoupling in multimodal language models. S$^3$E uses a positive-anchored A/B forced-choice setup in which an image-supported caption is contrasted against semantic stress candidates under both original and swapped option orders, while hidden states are extracted at the pre-answer decision state. We focus on strict-correct trials, where the model consistently selects the correct caption across both orders. Rather than treating arbitrary hidden-state variation as evidence of instability, we measure whether semantic-conflict candidates induce excess decision-state displacement relative to meaning-preserving controls. Across Qwen3VL, Gemma3, and InternVL3, semantic stress consistently produces positive selected-layer excess displacement over lexical controls despite correct forced-choice behavior, while comparisons against random negatives are model-dependent. We interpret this as a scoped decision-state stress-sensitivity signal rather than evidence of downstream failure or hallucination. Our results suggest that forced-choice correctness alone is not a sufficient certificate of invariant internal decision geometry.

Problem

Research questions and friction points this paper is trying to address.

multimodal language models

internal decision state

semantic stress

behavior-internal decoupling

decision-state stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured semantic stress

decision-state probing

multimodal language models