🤖 AI Summary
Large language models (LLMs) frequently hallucinate when processing long, noisy retrieval-augmented contexts, often relying on spurious correlations rather than genuine causal relationships. To address this, we propose Causal Intervention Prompting (CIP), a lightweight, plug-and-play prompting framework that injects entity–action–event causal sequences at the input stage to steer models toward authentic causal evidence. CIP introduces the first prompt-level mechanism grounded in causal intervention and counterfactual reasoning—requiring no model fine-tuning—to actively suppress non-causal inference pathways. Evaluated across seven mainstream LLMs, CIP improves factual attributability by +2.6 points, causal consistency by +0.38 points, effective information density by 4×, and reduces end-to-end latency by 55.1%. Collectively, these results demonstrate substantial gains in factual grounding, causal robustness, and inference efficiency.
📝 Abstract
Large language models often hallucinate when processing long and noisy retrieval contexts because they rely on spurious correlations rather than genuine causal relationships. We propose CIP, a lightweight and plug-and-play causal prompting framework that mitigates hallucinations at the input stage. CIP constructs a causal relation sequence among entities, actions, and events and injects it into the prompt to guide reasoning toward causally relevant evidence. Through causal intervention and counterfactual reasoning, CIP suppresses non causal reasoning paths, improving factual grounding and interpretability. Experiments across seven mainstream language models, including GPT-4o, Gemini 2.0 Flash, and Llama 3.1, show that CIP consistently enhances reasoning quality and reliability, achieving 2.6 points improvement in Attributable Rate, 0.38 improvement in Causal Consistency Score, and a fourfold increase in effective information density. API level profiling further shows that CIP accelerates contextual understanding and reduces end to end response latency by up to 55.1 percent. These results suggest that causal reasoning may serve as a promising paradigm for improving the explainability, stability, and efficiency of large language models.