Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the factual consistency mechanisms of large language models (LLMs) when competing factual and counterfactual information coexist, focusing on attention head suppression behavior and its domain dependence. Method: Using mechanistic interpretability techniques, we quantitatively analyze the relationship between per-layer attention head strength and factual output proportion, systematically evaluating suppression strategies across diverse domains. Results: (1) Attention heads dominating factual outputs primarily perform *generic copy suppression*—attenuating repetitive or redundant tokens to enhance output consistency—rather than selective suppression; (2) this suppression exhibits strong domain specificity, and with increasing model scale, attention heads develop finer-grained specialization at the semantic category level. Our work reproduces and unifies several recent findings, and—crucially—first establishes the universality of non-selective suppression in factual consistency maintenance, as well as the model-scale-driven emergence of functional specialization among attention heads.

Technology Category

Application Category

📝 Abstract
This paper presents a reproducibility study examining how Large Language Models (LLMs) manage competing factual and counterfactual information, focusing on the role of attention heads in this process. We attempt to reproduce and reconcile findings from three recent studies by Ortu et al., Yu, Merullo, and Pavlick and McDougall et al. that investigate the competition between model-learned facts and contradictory context information through Mechanistic Interpretability tools. Our study specifically examines the relationship between attention head strength and factual output ratios, evaluates competing hypotheses about attention heads' suppression mechanisms, and investigates the domain specificity of these attention patterns. Our findings suggest that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression, as strengthening them can also inhibit correct facts. Additionally, we show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.
Problem

Research questions and friction points this paper is trying to address.

How LLMs handle competing factual and counterfactual information
Role of attention heads in suppressing or promoting facts
Domain specificity of attention patterns in large models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reproducing studies on LLM factual competition
Analyzing attention head suppression mechanisms
Investigating domain-specific attention patterns
🔎 Similar Papers
No similar papers found.
D
Dante Campregher
University of Amsterdam
Y
Yanxu Chen
University of Amsterdam
S
Sander Hoffman
University of Amsterdam
Maria Heuss
Maria Heuss
University of Amsterdam
Artificial IntelligenceExplainabilityFairnessInformation Retrieval