Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This study investigates why multimodal large language models (MLLMs) perform significantly worse when verifying scientific claims using chart-based evidence compared to tabular data, despite the semantic equivalence of the underlying information. Through layer-wise linear probing and attention analysis across three open-source vision-language models, the authors systematically compare how these models process identical data presented as tables versus charts. They find that while chart information is effectively encoded in intermediate layers, it fails to be adequately routed to the final prediction layers. The performance gap between tables and charts stems primarily from this routing failure rather than insufficient encoding, with distinct failure patterns emerging across different model architectures. These findings highlight a critical bottleneck in cross-modal reasoning within current MLLMs.
📝 Abstract
Multimodal LLMs are increasingly used to assist scientific peer review, where a core requirement is verifying whether claims in a paper are supported by its evidence. Prior work has shown that models perform substantially better at this task when the evidence is a table than when it is a chart of the same underlying data. This raises the question of whether models fail to extract information from charts, or do they extract it but fail to use it when forming their prediction? We study this question through layer-wise linear probing and attention analysis on three open-weight VLMs over table and chart evidence, representing the same underlying data. We find consistent evidence for the latter. Chart information is encoded in the models' intermediate representations but does not reach the prediction position, a gap that is absent for tables and holds across all conditions tested. Attention analysis further reveals that this disconnect takes two architecturally distinct forms across model families. These findings reframe the table-chart gap as a failure of how encoded visual information is routed at prediction time, rather than a failure of encoding itself.
Problem

Research questions and friction points this paper is trying to address.

scientific claim verification
table-chart gap
multimodal LLMs
visual information encoding
evidence routing
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal LLMs
scientific claim verification
layer-wise probing
attention analysis
information routing
🔎 Similar Papers