🤖 AI Summary
This work addresses the scarcity of high-quality, realistic, and diverse multimodal datasets containing refuted claims for scientific claim verification. To bridge this gap, the authors introduce SciClaimEval, a novel multimodal dataset comprising 1,664 samples spanning machine learning, natural language processing, and medical domains. SciClaimEval uniquely generates counterexamples by altering figures in original scientific papers—rather than modifying textual claims—and provides evidence in multiple formats, including images, LaTeX, HTML, and JSON. The dataset is constructed through expert annotation and a rigorous multimodal processing pipeline. Comprehensive benchmarking across eleven open- and closed-source multimodal models reveals that current systems still fall significantly short of human performance on chart-based fact verification tasks.
📝 Abstract
We present SciClaimEval, a new scientific dataset for the claim verification task. Unlike existing resources, SciClaimEval features authentic claims, including refuted ones, directly extracted from published papers. To create refuted claims, we introduce a novel approach that modifies the supporting evidence (figures and tables), rather than altering the claims or relying on large language models (LLMs) to fabricate contradictions. The dataset provides cross-modal evidence with diverse representations: figures are available as images, while tables are provided in multiple formats, including images, LaTeX source, HTML, and JSON. SciClaimEval contains 1,664 annotated samples from 180 papers across three domains, machine learning, natural language processing, and medicine, validated through expert annotation. We benchmark 11 multimodal foundation models, both open-source and proprietary, across the dataset. Results show that figure-based verification remains particularly challenging for all models, as a substantial performance gap remains between the best system and human baseline.