MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

๐Ÿ“… 2025-02-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) suffer from distraction by irrelevant information in long-context, multi-document question answering, leading to degraded attention focus and factual grounding. To address this, we propose a retrieval-oriented contrastive learning method that operates directly at the attention-head levelโ€”the first approach to apply contrastive learning explicitly to individual attention heads. Our method constructs head-level positive and negative sample pairs from retrieved document snippets and jointly fine-tunes attention heads to enhance selective focus on relevant content. Evaluated on multiple long-context QA benchmarks, our approach significantly improves retrieval accuracy and answer quality over strong baselines. Attention visualization confirms that it effectively suppresses interference from irrelevant passages and strengthens the modelโ€™s robustness in factual localization.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input, which severely impairs their long-context capabilities. Inspired by recent studies on the effectiveness of retrieval heads in long-context factutality, we aim at addressing this distraction issue through improving such retrieval heads directly. We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimizes the attention distribution at the head level through contrastive learning. According to the experimental results, MuDAF can significantly improve the long-context question answering performance of LLMs, especially in multi-document question answering. Extensive evaluations on retrieval scores and attention visualizations show that MuDAF possesses great potential in making attention heads more focused on relevant information and reducing attention distractions.
Problem

Research questions and friction points this paper is trying to address.

LLMs exhibit distracted attention due to irrelevant long-context information
Improving retrieval heads to reduce attention distraction in multi-document settings
Enhancing question answering by focusing attention through contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning optimizes attention head distributions
Directly improves retrieval heads to reduce attention distractions
Focuses multi-document attention on relevant information contextually
๐Ÿ”Ž Similar Papers
No similar papers found.