🤖 AI Summary
This study addresses critical limitations of traditional Retrieval-Augmented Generation (RAG) systems—such as retrieval noise, misuse of retrieved content, weak query-document alignment, and high generation costs—and presents the first large-scale empirical comparison between enhanced RAG and agentic RAG paradigms. Leveraging a large language model–driven agent control flow, modular RAG components, and a multidimensional evaluation framework encompassing accuracy, robustness, and computational cost, the work systematically assesses performance and efficiency across diverse scenarios. Findings reveal that agentic RAG demonstrates superior adaptability in complex tasks, whereas enhanced RAG achieves higher efficiency in simpler settings. These results provide clear guidance on the trade-offs between performance and cost for real-world deployment and underscore a pathway toward more autonomous, agent-based RAG architectures.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer user queries. However, such basic implementations exhibit several limitations, including noisy or suboptimal retrieval, misuse of retrieval for out-of-scope queries, weak query-document matching, and variability or cost associated with the generator. These shortcomings have motivated the development of"Enhanced"RAG, where dedicated modules are introduced to address specific weaknesses in the workflow. More recently, the growing self-reflective capabilities of Large Language Models (LLMs) have enabled a new paradigm, which we refer to as"Agentic"RAG. In this approach, the LLM orchestrates the entire process-deciding which actions to perform, when to perform them, and whether to iterate-thereby reducing reliance on fixed, manually engineered modules. Despite the rapid adoption of both paradigms, it remains unclear which approach is preferable under which conditions. In this work, we conduct an extensive, empirically driven evaluation of Enhanced and Agentic RAG across multiple scenarios and dimensions. Our results provide practical insights into the trade-offs between the two paradigms, offering guidance on selecting the most effective RAG design for real-world applications, considering both costs and performance.