Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning

πŸ“… 2026-03-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing academic reasoning approaches, which predominantly rely on goal-directed retrieval and struggle to achieve coherent understanding and verification across entire scholarly papers. We introduce ScholScan, a novel benchmark that pioneers a scan-based academic reasoning paradigm, requiring models to read full papers and cross-verify content to identify inconsistencies. ScholScan encompasses 715 papers across 13 natural science domains, offering 1,800 fine-grained annotated questions with reasoning trajectories targeting nine types of consistency errors. Leveraging multimodal large language models (MLLMs), we evaluate 15 models under 24 input configurations using evidence localization, chain-of-thought annotations, and a unified evaluation protocol. Results reveal that current retrieval-augmented generation (RAG) methods yield no significant improvement, exposing systematic deficiencies of MLLMs in scan-based reasoning and underscoring ScholScan’s challenge and potential to guide future research.
πŸ“ Abstract
With the rapid progress of multimodal large language models (MLLMs), AI already performs well at literature retrieval and certain reasoning tasks, serving as a capable assistant to human researchers, yet it remains far from autonomous research. The fundamental reason is that current work on academic paper reasoning is largely confined to a search-oriented paradigm centered on pre-specified targets, with reasoning grounded in relevance retrieval, which struggles to support researcher-style full-document understanding, reasoning, and verification. To bridge this gap, we propose \textbf{ScholScan}, a new benchmark for academic paper reasoning. ScholScan introduces a scan-oriented task setting that asks models to read and cross-check entire papers like human researchers, scanning the document to identify consistency issues. The benchmark comprises 1,800 carefully annotated questions drawn from nine error categories across 13 natural-science domains and 715 papers, and provides detailed annotations for evidence localization and reasoning traces, together with a unified evaluation protocol. We assessed 15 models across 24 input configurations and conducted a fine-grained analysis of MLLM capabilities for all error categories. Across the board, retrieval-augmented generation (RAG) methods yield no significant improvements, revealing systematic deficiencies of current MLLMs on scan-oriented tasks and underscoring the challenge posed by ScholScan. We expect ScholScan to be the leading and representative work of the scan-oriented task paradigm.
Problem

Research questions and friction points this paper is trying to address.

scan-oriented reasoning
academic paper understanding
multimodal large language models
full-document reasoning
consistency verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

scan-oriented reasoning
multimodal large language models
academic paper benchmark
full-document understanding
consistency verification
Rongjin Li
Rongjin Li
Xiamen University, VoiceAI
speaker recognitionspeech enhancementdeep learning
Z
Zichen Tang
Beijing University of Posts and Telecommunications
X
Xianghe Wang
Beijing University of Posts and Telecommunications
X
Xinyi Hu
Beijing University of Posts and Telecommunications
Zhengyu Wang
Zhengyu Wang
Huazhong University of Science and Technology
reconfigurable intelligent surfacesrandom matrix theory
Z
Zhengyu Lu
Beijing University of Posts and Telecommunications
Yiling Huang
Yiling Huang
PhD Student in Statistics, University of Michigan
Selective Inference
J
Jiayuan Chen
Beijing University of Posts and Telecommunications
W
Weisheng Tan
Beijing University of Posts and Telecommunications
J
Jiacheng Liu
Beijing University of Posts and Telecommunications
Z
Zhongjun Yang
Beijing University of Posts and Telecommunications
H
Haihong E
Beijing University of Posts and Telecommunications