Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

165K/year
🤖 AI Summary
Existing systems struggle to perform cross-document quantitative analysis and synthetic reasoning over large-scale semi-structured documents to answer complex questions. This work formally defines the multi-document analytical question answering task and introduces MuDABench, a benchmark comprising over 80,000 document pages and 332 question-answer instances, automatically annotated via distant supervision leveraging metadata and financial databases. The authors propose a multi-agent collaborative workflow that integrates planning, information extraction, and code generation, moving beyond conventional flat retrieval paradigms. Evaluation employs dual metrics—intermediate fact coverage and answer accuracy—to assess performance. Experiments demonstrate that the proposed approach significantly outperforms standard RAG systems, yet still lags behind human experts in single-document extraction precision and domain-specific knowledge.

Technology Category

Application Category

📝 Abstract
This paper introduces the task of analytical question answering over large, semi-structured document collections. We present MuDABench, a benchmark for multi-document analytical QA, where questions require extracting and synthesizing information across numerous documents to perform quantitative analysis. Unlike existing multi-document QA benchmarks that typically require information from only a few documents with limited cross-document reasoning, MuDABench demands extensive inter-document analysis and aggregation. Constructed via distant supervision by leveraging document-level metadata and annotated financial databases, MuDABench comprises over 80,000 pages and 332 analytical QA instances. We also propose an evaluation protocol that measures final answer accuracy and uses intermediate-fact coverage as an auxiliary diagnostic signal for the reasoning process. Experiments reveal that standard RAG systems, which treat all documents as a flat retrieval pool, perform poorly. To address these limitations, we propose a multi-agent workflow that orchestrates planning, extraction, and code generation modules. While this approach substantially improves both process and outcome metrics, a significant gap remains compared to human expert performance. Our analysis identifies two primary bottlenecks: single-document information extraction accuracy and insufficient domain-specific knowledge in current systems. MuDABench is available at https://github.com/Zhanli-Li/MuDABench.
Problem

Research questions and friction points this paper is trying to address.

analytical question answering
multi-document QA
cross-document reasoning
information aggregation
large-scale document collections
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-document QA
Analytical Reasoning
MuDABench
Distant Supervision
Multi-agent Workflow
🔎 Similar Papers
No similar papers found.
Z
Zhanli Li
Wenlan School of Business, Zhongnan University of Economics and Law, Wuhan 430073, China; State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
Yixuan Cao
Yixuan Cao
Shenzhen University
Software EngineeringSecurityKernel & CompilerTesting & VerificationBig Data
L
Lvzhou Luo
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing