Inducing Vulnerable Code Generation in LLM Coding Assistants

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies HACKODE, a novel supply-chain attack targeting LLM-based coding assistants that rely on external documentation (e.g., Stack Overflow, API references) for code generation. The attack exploits the models’ dependency on such sources by poisoning high-ranking documents with malicious content—inducing the embedding of critical security vulnerabilities, including buffer overflows and missing input validation. The authors are the first to systematically discover and empirically validate this “external knowledge poisoning → vulnerability injection” mechanism. They propose a generalizable adversarial attack framework that integrates prompt engineering with targeted malicious document injection. Evaluated across four major models—GPT-4, Claude, CodeLlama, and StarCoder—the attack achieves an average success rate of 84.29%; in realistic IDE plugin settings, it reaches 75.92%. These results demonstrate HACKODE’s significant practical threat and cross-platform applicability.

Technology Category

Application Category

📝 Abstract
Due to insufficient domain knowledge, LLM coding assistants often reference related solutions from the Internet to address programming problems. However, incorporating external information into LLMs' code generation process introduces new security risks. In this paper, we reveal a real-world threat, named HACKODE, where attackers exploit referenced external information to embed attack sequences, causing LLMs to produce code with vulnerabilities such as buffer overflows and incomplete validations. We designed a prototype of the attack, which generates effective attack sequences for potential diverse inputs with various user queries and prompt templates. Through the evaluation on two general LLMs and two code LLMs, we demonstrate that the attack is effective, achieving an 84.29% success rate. Additionally, on a real-world application, HACKODE achieves 75.92% ASR, demonstrating its real-world impact.
Problem

Research questions and friction points this paper is trying to address.

LLM coding assistants generate vulnerable code due to external references
Attackers exploit referenced information to embed vulnerabilities in generated code
HACKODE attack demonstrates high success rates in real-world scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploiting external references to embed attacks
Generating attack sequences for diverse inputs
Achieving high success rates on multiple LLMs
🔎 Similar Papers
No similar papers found.
B
Binqi Zeng
Central South University, China
Q
Quan Zhang
Tsinghua University, China
Chijin Zhou
Chijin Zhou
East China Normal University
System SecuritySoftware EngineeringProgram Analysis
Gwihwan Go
Gwihwan Go
Tsinghua University
Y
Yu Jiang
Tsinghua University, China
Heyuan Shi
Heyuan Shi
Central South University