🤖 AI Summary
Current agentic RAG systems lack a unified theoretical framework, resulting in architectural fragmentation, inconsistent evaluation protocols, and insufficiently characterized reliability risks. This work addresses these challenges by formally modeling agentic RAG as a finite-horizon partially observable Markov decision process. It introduces a modular architecture and a systematic taxonomy encompassing core components such as planning, retrieval coordination, memory paradigms, and tool invocation. By analyzing the limitations of static evaluation and identifying dynamic risks inherent in autonomous loops—particularly hallucination propagation and memory contamination—the study establishes a theoretical foundation for agentic RAG and proposes reliability-oriented evaluation criteria. Furthermore, it outlines key directions for future research, including adaptive retrieval strategies, cost-aware coordination mechanisms, and effective supervision frameworks.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial adoption, current research lacks a systematic understanding of Agentic RAG as a sequential decision-making system, leading to highly fragmented architectures, inconsistent evaluation methodologies, and unresolved reliability risks. This Systematization of Knowledge (SoK) paper provides the first unified framework for understanding these autonomous systems. We formalize agentic retrieval-generation loops as finite-horizon partially observable Markov decision processes, explicitly modeling their control policies and state transitions. Building upon this formalization, we develop a comprehensive taxonomy and modular architectural decomposition that categorizes systems by their planning mechanisms, retrieval orchestration, memory paradigms, and tool-invocation behaviors. We further analyze the critical limitations of traditional static evaluation practices and identify severe systemic risks inherent to autonomous loops, including compounding hallucination propagation, memory poisoning, retrieval misalignment, and cascading tool-execution vulnerabilities. Finally, we outline key doctoral-scale research directions spanning stable adaptive retrieval, cost-aware orchestration, formal trajectory evaluation, and oversight mechanisms, providing a definitive roadmap for building reliable, controllable, and scalable agentic retrieval systems.