Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding

๐Ÿ“… 2025-09-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address performance degradation and excessive token consumption in retrieval-augmented generation (RAG) caused by redundant external knowledge, the โ€œlost-in-the-middleโ€ phenomenon, and distracting passages, this paper proposes LDARโ€”a distraction-aware learning-based retrieval method. LDAR models the degree to which contextual passages distract large language models (LLMs) during generation, enabling adaptive identification and suppression of noise-prone, attention-diverting segments and dynamic refinement of retrieved results. Its core innovation lies in a distraction-aware supervised training objective that jointly optimizes long-context modeling and retriever adaptation. Evaluated on six knowledge-intensive benchmarks, LDAR significantly outperforms conventional RAG and standalone long-context baselines. It maintains or improves output quality while reducing average token consumption by 20โ€“35%, achieving Pareto improvements in both effectiveness and efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Retrieval-Augmented Generation (RAG) is a framework for grounding Large Language Models (LLMs) in external, up-to-date information. However, recent advancements in context window size allow LLMs to process inputs of up to 128K tokens or more, offering an alternative strategy: supplying the full document context directly to the model, rather than relying on RAG to retrieve a subset of contexts. Nevertheless, this emerging alternative strategy has notable limitations: (i) it is token-inefficient to handle large and potentially redundant contexts; (ii) it exacerbates the `lost in the middle' phenomenon; and (iii) under limited model capacity, it amplifies distraction, ultimately degrading LLM output quality. In this paper, we propose LDAR (Learning Distraction-Aware Retrieval), an adaptive retriever that learns to retrieve contexts in a way that mitigates interference from distracting passages, thereby achieving significantly higher performance with reduced token usage compared to long-context approaches. Extensive experiments across diverse LLM architectures and six knowledge-intensive benchmarks demonstrate the effectiveness and robustness of our approach, highlighting the importance of balancing the trade-off between information coverage and distraction.
Problem

Research questions and friction points this paper is trying to address.

Addressing token inefficiency in long-context LLM processing
Mitigating distraction and lost-in-the-middle phenomena
Improving retrieval quality for knowledge grounding efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning Distraction-Aware Retrieval for efficient grounding
Adaptive retriever mitigates interference from distracting passages
Achieves higher performance with reduced token usage
๐Ÿ”Ž Similar Papers