Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models often generate fluent yet factually incorrect text—a phenomenon known as hallucination. This work proposes Adaptive Activation Cancellation (AAC), a training-free, inference-time intervention framework that requires neither external knowledge nor additional reasoning steps, adapting for the first time the principle of adaptive noise cancellation from signal processing to hallucination mitigation. AAC identifies hallucination-related neurons in the residual stream via inter-layer linear probes and suppresses their activations in real time using confidence-weighted forward hooks. Evaluated on OPT-125M, Phi-3-mini, and LLaMA-3-8B, AAC consistently improves accuracy on TruthfulQA and HaluEval while preserving performance on WikiText-103 perplexity and MMLU. Notably, on LLaMA-3-8B, it significantly enhances MC1, MC2, and Token-F1 metrics, with probe selectivity 3.5–5.94 times higher than the ITI baseline.

Technology Category

Application Category

📝 Abstract

Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive Activation Cancellation (AAC), a real-time inference-time framework that treats hallucination-associated neural activations as structured interference within the transformer residual stream, drawing an explicit analogy to classical adaptive noise cancellation from signal processing. The framework identifies Hallucination Nodes (H-Nodes) via layer-wise linear probing and suppresses them using a confidence-weighted forward hook during auto-regressive generation -- requiring no external knowledge, no fine-tuning, and no additional inference passes. Evaluated across OPT-125M, Phi-3-mini, and LLaMA 3-8B on TruthfulQA and HaluEval, the real-time hook is the only intervention that consistently improves downstream accuracy on all three scales. Critically, the method is strictly surgical: WikiText-103 perplexity and MMLU reasoning accuracy are preserved at exactly 0.0% degradation across all three model scales, a property that distinguishes AAC from interventions that trade fluency or general capability for factual improvement. On the LLaMA 3-8B scale, the hook additionally yields positive generation-level gains (MC1 +0.04; MC2 +0.003; Token-F1 +0.003) while achieving probe-space selectivity 5.94x - 3.5x higher than the ITI baseline -- demonstrating that targeted neuron-level suppression can simultaneously improve factual accuracy and preserve model capability.

Problem

Research questions and friction points this paper is trying to address.

hallucination

large language models

factual accuracy

inference-time intervention

neural activation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Activation Cancellation

Hallucination Mitigation

H-Nodes

Inference-time Intervention

Residual Stream Interference

🔎 Similar Papers

No similar papers found.

Authors to Follow