Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

πŸ“… 2026-03-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models often generate fluent yet factually incorrect textβ€”a phenomenon known as hallucination. This work proposes Adaptive Activation Cancellation (AAC), a training-free, inference-time intervention framework that requires neither external knowledge nor additional reasoning steps, adapting for the first time the principle of adaptive noise cancellation from signal processing to hallucination mitigation. AAC identifies hallucination-related neurons in the residual stream via inter-layer linear probes and suppresses their activations in real time using confidence-weighted forward hooks. Evaluated on OPT-125M, Phi-3-mini, and LLaMA-3-8B, AAC consistently improves accuracy on TruthfulQA and HaluEval while preserving performance on WikiText-103 perplexity and MMLU. Notably, on LLaMA-3-8B, it significantly enhances MC1, MC2, and Token-F1 metrics, with probe selectivity 3.5–5.94 times higher than the ITI baseline.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive Activation Cancellation (AAC), a real-time inference-time framework that treats hallucination-associated neural activations as structured interference within the transformer residual stream, drawing an explicit analogy to classical adaptive noise cancellation from signal processing. The framework identifies Hallucination Nodes (H-Nodes) via layer-wise linear probing and suppresses them using a confidence-weighted forward hook during auto-regressive generation -- requiring no external knowledge, no fine-tuning, and no additional inference passes. Evaluated across OPT-125M, Phi-3-mini, and LLaMA 3-8B on TruthfulQA and HaluEval, the real-time hook is the only intervention that consistently improves downstream accuracy on all three scales. Critically, the method is strictly surgical: WikiText-103 perplexity and MMLU reasoning accuracy are preserved at exactly 0.0% degradation across all three model scales, a property that distinguishes AAC from interventions that trade fluency or general capability for factual improvement. On the LLaMA 3-8B scale, the hook additionally yields positive generation-level gains (MC1 +0.04; MC2 +0.003; Token-F1 +0.003) while achieving probe-space selectivity 5.94x - 3.5x higher than the ITI baseline -- demonstrating that targeted neuron-level suppression can simultaneously improve factual accuracy and preserve model capability.
Problem

Research questions and friction points this paper is trying to address.

hallucination
large language models
factual accuracy
inference-time intervention
neural activation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Activation Cancellation
Hallucination Mitigation
H-Nodes
Inference-time Intervention
Residual Stream Interference
πŸ”Ž Similar Papers
No similar papers found.
E
Eric Yocam
The Beacom College of Computer and Cyber Sciences, Dakota State University, Madison, SD 57042, USA.
V
Varghese Vaidyan
The Beacom College of Computer and Cyber Sciences, Dakota State University, Madison, SD 57042, USA.
Gurcan Comert
Gurcan Comert
NCAT, Vericast, Benedict College, University of Illinois Urbana-Champaign, U of South Carolina, C2M2
transportation engineeringtrafficconnected and autonomous systems
P
Paris Kalathas
Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo, CA 93407, USA.
Yong Wang
Yong Wang
Professor, University of Idaho
Network security
J
Judith L. Mwakalonge
Department of Civil and Mechanical Engineering Technology, South Carolina State University, Orangeburg, SC 29115, USA.