Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world RAG systems suffer from heterogeneous error types that are difficult to diagnose during deployment. Method: This paper introduces the first production-oriented RAG error taxonomy, categorizing failures along four dimensions—retrieval, generation, alignment, and hallucination—and constructs RAG-ErrorBank, the first large-scale, human-annotated dataset of RAG error types, publicly released. It further designs an automated error detection and evaluation framework strictly aligned with the taxonomy, enabling fine-grained error localization and robustness quantification. Contributions/Results: Experiments demonstrate a +28.6% improvement in error identification accuracy over baselines. The framework provides interpretable, reusable diagnostic pathways for systematic debugging and optimization. All components—including source code, RAG-ErrorBank, and evaluation tools—are fully open-sourced to foster reproducible research and practical deployment.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-world RAG systems, there are many potential causes for erroneous outputs. Understanding the range of errors that can occur in practice is crucial for robust deployment. We present a new taxonomy of the error types that can occur in realistic RAG systems, examples of each, and practical advice for addressing them. Additionally, we curate a dataset of erroneous RAG responses annotated by error types. We then propose an auto-evaluation method aligned with our taxonomy that can be used in practice to track and address errors during development. Code and data are available at https://github.com/layer6ai-labs/rag-error-classification.
Problem

Research questions and friction points this paper is trying to address.

Classifying diverse error types in retrieval-augmented generation systems
Providing practical solutions to address RAG system errors
Developing automated evaluation methods for error tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy for classifying RAG system error types
Dataset of annotated erroneous RAG responses
Auto-evaluation method tracking errors during development
🔎 Similar Papers
No similar papers found.
K
Kin Kwan Leung
Layer 6 AI, Toronto, Canada
M
Mouloud Belbahri
Layer 6 AI, Toronto, Canada
Yi Sui
Yi Sui
Layer 6 AI
Self-supervised learningExplainabilityTrustworthy AI
A
Alex Labach
Layer 6 AI, Toronto, Canada
X
Xueying Zhang
Layer 6 AI, Toronto, Canada
S
Stephen Rose
Layer 6 AI, Toronto, Canada
Jesse C. Cresswell
Jesse C. Cresswell
Layer 6 AI
Trustworthy MLDeep Generative ModellingQuantum Information