🤖 AI Summary
Large language models suffer from markedly degraded information retrieval accuracy in ultra-long contexts (e.g., ten-thousand-token documents). To address this, we propose a training-free attention head adaptive scaling mechanism. We first empirically observe substantial heterogeneity across attention heads in their contribution to long-range retrieval and quantify head–retrieval correlation via zero-shot generated data. Building on this insight, we introduce learnable scaling weights that dynamically amplify critical heads and suppress redundant ones during inference. Our method requires no fine-tuning and exhibits strong in-domain generalization and cross-domain robustness. On LongBench document QA benchmarks, it substantially improves retrieval accuracy—both in-domain and out-of-domain—with consistent gains. Moreover, it is fully compatible with mainstream context extension techniques, effectively extending the usable context window while preserving output reliability.
📝 Abstract
In this work, we introduce a novel approach called Scaling to Emphasize Attention for Long-context retrieval (SEAL), which enhances the retrieval performance of large language models (LLMs) over extended contexts. Previous studies have shown that each attention head in LLMs has a unique functionality and collectively contributes to the overall behavior of the model. Similarly, we observe that specific heads are closely tied to long-context retrieval, showing positive or negative correlation with retrieval scores. Built on this insight, we propose a learning-based mechanism using zero-shot generated data to emphasize these heads, improving the model's performance in long-context retrieval tasks. By applying SEAL, we can achieve significant improvements in in-domain retrieval performance, including document QA tasks from LongBench, and considerable improvements in out-of-domain cases. Additionally, when combined with existing training-free context extension techniques, SEAL extends the context limits of LLMs while maintaining highly reliable outputs, opening new avenues for research in this field.