The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

📅 2025-02-22

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This paper addresses the pervasive hallucination problem in large language models (LLMs), formalizing “knowledge masking”—a phenomenon wherein dominant knowledge suppresses peripheral knowledge, leading to factual errors—and introduces the first hallucination quantification framework grounded in the log-linear law for *a priori* hallucination rate prediction. Methodologically, it proposes a novel knowledge representation modeling approach coupled with a masking-aware CoDa decoding strategy, rigorously evaluated across multiple benchmarks (Overshadow, MemoTrap, NQ-Swap). Key contributions include: (1) the first formal definition of knowledge masking; (2) empirical validation of a robust log-linear relationship between hallucination rate and knowledge popularity, sequence length, and model scale; and (3) CoDa’s significant factual accuracy improvements (+27.9% on Overshadow, +13.1% on MemoTrap, +18.3% on NQ-Swap), establishing a new paradigm for hallucination mitigation that is interpretable, predictable, and intervenable.

Technology Category

Application Category

📝 Abstract

Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.

Problem

Research questions and friction points this paper is trying to address.

Understanding LLM hallucination mechanisms

Predicting factual hallucination rates

Mitigating hallucinations in LLM outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantify hallucinations via log-linear law

Introduce CoDa decoding for fewer errors

Model knowledge overshadowing to predict inaccuracies

🔎 Similar Papers

Banishing LLM Hallucinations Requires Rethinking Generalization