๐ค AI Summary
To address performance degradation and excessive token consumption in retrieval-augmented generation (RAG) caused by redundant external knowledge, the โlost-in-the-middleโ phenomenon, and distracting passages, this paper proposes LDARโa distraction-aware learning-based retrieval method. LDAR models the degree to which contextual passages distract large language models (LLMs) during generation, enabling adaptive identification and suppression of noise-prone, attention-diverting segments and dynamic refinement of retrieved results. Its core innovation lies in a distraction-aware supervised training objective that jointly optimizes long-context modeling and retriever adaptation. Evaluated on six knowledge-intensive benchmarks, LDAR significantly outperforms conventional RAG and standalone long-context baselines. It maintains or improves output quality while reducing average token consumption by 20โ35%, achieving Pareto improvements in both effectiveness and efficiency.
๐ Abstract
Retrieval-Augmented Generation (RAG) is a framework for grounding Large Language Models (LLMs) in external, up-to-date information. However, recent advancements in context window size allow LLMs to process inputs of up to 128K tokens or more, offering an alternative strategy: supplying the full document context directly to the model, rather than relying on RAG to retrieve a subset of contexts. Nevertheless, this emerging alternative strategy has notable limitations: (i) it is token-inefficient to handle large and potentially redundant contexts; (ii) it exacerbates the `lost in the middle' phenomenon; and (iii) under limited model capacity, it amplifies distraction, ultimately degrading LLM output quality. In this paper, we propose LDAR (Learning Distraction-Aware Retrieval), an adaptive retriever that learns to retrieve contexts in a way that mitigates interference from distracting passages, thereby achieving significantly higher performance with reduced token usage compared to long-context approaches. Extensive experiments across diverse LLM architectures and six knowledge-intensive benchmarks demonstrate the effectiveness and robustness of our approach, highlighting the importance of balancing the trade-off between information coverage and distraction.