Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

132K/year

🤖 AI Summary

Real-world RAG systems suffer from heterogeneous error types that are difficult to diagnose during deployment. Method: This paper introduces the first production-oriented RAG error taxonomy, categorizing failures along four dimensions—retrieval, generation, alignment, and hallucination—and constructs RAG-ErrorBank, the first large-scale, human-annotated dataset of RAG error types, publicly released. It further designs an automated error detection and evaluation framework strictly aligned with the taxonomy, enabling fine-grained error localization and robustness quantification. Contributions/Results: Experiments demonstrate a +28.6% improvement in error identification accuracy over baselines. The framework provides interpretable, reusable diagnostic pathways for systematic debugging and optimization. All components—including source code, RAG-ErrorBank, and evaluation tools—are fully open-sourced to foster reproducible research and practical deployment.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-world RAG systems, there are many potential causes for erroneous outputs. Understanding the range of errors that can occur in practice is crucial for robust deployment. We present a new taxonomy of the error types that can occur in realistic RAG systems, examples of each, and practical advice for addressing them. Additionally, we curate a dataset of erroneous RAG responses annotated by error types. We then propose an auto-evaluation method aligned with our taxonomy that can be used in practice to track and address errors during development. Code and data are available at https://github.com/layer6ai-labs/rag-error-classification.

Problem

Research questions and friction points this paper is trying to address.

Classifying diverse error types in retrieval-augmented generation systems

Providing practical solutions to address RAG system errors

Developing automated evaluation methods for error tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy for classifying RAG system error types

Dataset of annotated erroneous RAG responses

Auto-evaluation method tracking errors during development

🔎 Similar Papers

Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models