A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This paper systematically surveys hallucination in large language models for code, addressing its root causes, mitigation strategies, code-specific challenges—including syntactic sensitivity, type systems, and external dependencies—and evaluation methodologies. Methodologically, it introduces the first hallucination taxonomy tailored to code generation, proposes a unified mitigation framework integrating knowledge-enhanced generation, constrained decoding, and post-editing, and implements hallucination detection and correction via program analysis, symbolic execution, and unit testing. Based on a synthesis of 60 studies, the work categorizes prevalent causes and technical approaches, compares static versus dynamic evaluation benchmarks, and advocates for a dynamic, multi-stage verification benchmark for hallucination assessment. The contributions include a rigorous theoretical framework and practical guidelines to enhance the reliability of code LMs in critical software engineering tasks.

Technology Category

Application Category

📝 Abstract

Model hallucination is one of the most critical challenges faced by Large Language Models (LLMs), especially in high-stakes code intelligence tasks. As LLMs become increasingly integrated into software engineering tasks, understanding and mitigating hallucination in code becomes essential. In this survey, we provide a systematic review of hallucination phenomena in code-oriented LLMs from four key perspectives. First, we begin by surveying 60 papers to define hallucination in the context of code and summarize its primary causes, such as data noise, exposure bias, and insufficient semantic grounding, while also tracing recent trends in literature across natural language processing (NLP) and software engineering communities. Second, we review model hallucination surveys in a broader span and summarize representative hallucination mitigation strategies, such as knowledge-enhanced generation, constrained decoding, and post-editing. Third, we review approaches targeted for code intelligence and highlight code-specific challenges that aggravate hallucination, including syntax sensitivity, strict type systems, and dependence on external libraries. Meanwhile, we analyze how emerging code intelligence tasks, e.g., program analysis, symbolic execution, and unit testing, are utilized to detect and mitigate hallucinations. Fourth, we summarize current evaluation benchmarks, ranging from static metrics to dynamic checks, e.g., compilation and execution correctness, and emphasize the need for hallucination-oriented benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Systematically reviews code hallucinations in LLMs

Analyzes causes and mitigation strategies for unreliable code

Identifies code-specific challenges and evaluation benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-enhanced generation to mitigate code hallucinations

Constrained decoding techniques for reliable code generation

Post-editing methods to correct hallucinated code outputs

🔎 Similar Papers

CodeMirage: Hallucinations in Code Generated by Large Language Models