Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

📅 2024-10-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

The utility, typology, and synergistic integration of self-generated documentation (Self-Docs) from large language models (LLMs) in retrieval-augmented generation (RAG) remain poorly understood—particularly for knowledge-intensive question answering. Method: This work establishes, for the first time, an interpretable Self-Docs typology grounded in systemic functional linguistics and designs a multi-dimensional RAG evaluation framework rigorously tested on Natural Questions and TriviaQA. Contribution/Results: Empirical analysis reveals substantial heterogeneity in performance gains across linguistically defined Self-Docs categories; notably, hybrid usage strategies—e.g., combining “definition” and “exemplification”—consistently outperform standard RAG baselines. The study provides both theoretical foundations and practical guidelines for controllable Self-Docs generation, semantics-aware filtering, and effective coordination with external retrieved documents.

Technology Category

Application Category

📝 Abstract

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

Problem

Research questions and friction points this paper is trying to address.

Self-Docs

Intelligent Answering Systems

Complex Knowledge Query

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Docs

Intelligent Answering Systems

Knowledge-intensive QA

🔎 Similar Papers

No similar papers found.

Authors to Follow