Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address performance degradation and excessive token consumption in retrieval-augmented generation (RAG) caused by redundant external knowledge, the “lost-in-the-middle” phenomenon, and distracting passages, this paper proposes LDAR—a distraction-aware learning-based retrieval method. LDAR models the degree to which contextual passages distract large language models (LLMs) during generation, enabling adaptive identification and suppression of noise-prone, attention-diverting segments and dynamic refinement of retrieved results. Its core innovation lies in a distraction-aware supervised training objective that jointly optimizes long-context modeling and retriever adaptation. Evaluated on six knowledge-intensive benchmarks, LDAR significantly outperforms conventional RAG and standalone long-context baselines. It maintains or improves output quality while reducing average token consumption by 20–35%, achieving Pareto improvements in both effectiveness and efficiency.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) is a framework for grounding Large Language Models (LLMs) in external, up-to-date information. However, recent advancements in context window size allow LLMs to process inputs of up to 128K tokens or more, offering an alternative strategy: supplying the full document context directly to the model, rather than relying on RAG to retrieve a subset of contexts. Nevertheless, this emerging alternative strategy has notable limitations: (i) it is token-inefficient to handle large and potentially redundant contexts; (ii) it exacerbates the `lost in the middle' phenomenon; and (iii) under limited model capacity, it amplifies distraction, ultimately degrading LLM output quality. In this paper, we propose LDAR (Learning Distraction-Aware Retrieval), an adaptive retriever that learns to retrieve contexts in a way that mitigates interference from distracting passages, thereby achieving significantly higher performance with reduced token usage compared to long-context approaches. Extensive experiments across diverse LLM architectures and six knowledge-intensive benchmarks demonstrate the effectiveness and robustness of our approach, highlighting the importance of balancing the trade-off between information coverage and distraction.

Problem

Research questions and friction points this paper is trying to address.

Addressing token inefficiency in long-context LLM processing

Mitigating distraction and lost-in-the-middle phenomena

Improving retrieval quality for knowledge grounding efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning Distraction-Aware Retrieval for efficient grounding

Adaptive retriever mitigates interference from distracting passages

Achieves higher performance with reduced token usage

🔎 Similar Papers

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence