More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the *independent effect of document count* on LLM performance in RAG, isolating this variable for the first time under *fixed context length* and *fixed relevant passage position*. It reveals that multi-document processing poses a *distinct challenge*—orthogonal to long-context understanding—that impairs LLM reasoning regardless of model scale or architecture; accuracy degrades significantly with increasing document count, consistently across multiple open-source LLMs. To enable rigorous analysis, the authors introduce the first controllable *multi-hop QA benchmark*, supporting orthogonal control over document count, hop number, and passage position. They conduct systematic ablation studies to disentangle these factors. All code and data are publicly released, establishing a reproducible foundation for advancing RAG robustness research.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) provides LLMs with relevant documents. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various language models on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for LLMs. Additionally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We also make the datasets and code available: https://github.com/shaharl6000/MoreDocsSameLen .
Problem

Research questions and friction points this paper is trying to address.

Investigates impact of document quantity on RAG performance.
Isolates document count effect from context length in RAG.
Explores challenges of multiple documents versus long contexts.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Isolates document quantity impact on RAG performance
Evaluates LLMs with constant context length
Releases datasets and code for further research
🔎 Similar Papers
No similar papers found.
Shahar Levy
Shahar Levy
CS MSc, Hebrew University of Jerusalem
NLP
N
Nir Mazor
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
L
Lihi Shalmon
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
Michael Hassid
Michael Hassid
Meta FAIR, Hebrew University of Jerusalem
Natural Language ProcessingSpeechArtificial Intelligence
Gabriel Stanovsky
Gabriel Stanovsky
The Hebrew University of Jerusalem
Computational Linguistics