🤖 AI Summary
This study addresses a critical oversight in current retrieval-augmented generation (RAG) systems: their reliance on human-oriented document representations, which neglect the distinct representational needs of large language models as content consumers. Under fixed retrieval results, the authors systematically evaluate the impact of 14 document representation strategies—including selection, summarization, and rewriting—on question-answering accuracy across four generative models. Introducing answer retention rate as a novel metric to assess whether transformed documents preserve the correct answer, controlled experiments reveal for the first time that answer retention is the primary driver of generation accuracy, challenging prior assumptions that attributed performance gains to specific representational mechanisms. Notably, when answer retention is high, variations in wording, structure, length, or query dependence exert minimal influence on accuracy, underscoring that preserving answer information outweighs representational form.
📝 Abstract
Retrieval-Augmented Generation (RAG) supplements a language model's input with retrieved documents, yet most RAG pipelines inherit retrieval components designed for human readers. How retrieved content should be represented when the consumer is a large language model (LLM) rather than a human is less well understood. Recent work has proposed transformations of retrieved content and identified properties that affect generation, but each examines a single transformation or property in isolation, leaving open which features of a document's representation matter most. We address this with a controlled comparison: holding retrieval fixed, we vary only the representation of retrieved documents, comparing an original baseline against thirteen transformations spanning selection, summarisation, and reformulation, in query-dependent and query-independent variants. Across these fourteen representations we measure question-answering accuracy for four generators, and for each representation we also measure answer retention: whether a known answer-bearing document still supports its answer after transformation. We find that answer retention is the primary determinant of generator accuracy; notably, when retention is high, a representation's wording, structure, length, and query-dependence have limited effect. This suggests that accuracy gains attributed to specific mechanisms in prior work may be partly explained by how well those mechanisms preserve answer-bearing content, an attribution that cannot be settled without controlling for retention.