Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study systematically investigates the risk of open-source large language models (e.g., Qwen2, Mistral, Gemma-7B) generating vulnerable code under direct or indirect prompting. Method: We propose a novel “dynamic prompting + reverse prompting” dual-track experimental paradigm to quantitatively assess model propensity to reproduce real-world vulnerabilities (e.g., CWE-121/122), and analyze its nonlinear relationship with cyclomatic complexity; automated vulnerability verification is performed using the ESBMC static analysis tool. Contribution/Results: All evaluated models exhibit significant vulnerability generation tendencies; Qwen2 demonstrates superior robustness. Social-role prompts (e.g., “student role”) substantially increase vulnerability likelihood. Vulnerability reproduction rates peak at moderate cyclomatic complexity (5–10). Our work establishes a reproducible methodology and empirical benchmark for security assessment of LLM-generated code.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used as code assistants, yet their behavior when explicitly asked to generate insecure code remains poorly understood. While prior research has focused on unintended vulnerabilities or adversarial prompting techniques, this study examines a more direct threat scenario: open-source LLMs generating vulnerable code when prompted either directly or indirectly. We propose a dual experimental design: (1) Dynamic Prompting, which systematically varies vulnerability type, user persona, and directness across structured templates; and (2) Reverse Prompting, which derives prompts from real vulnerable code samples to assess vulnerability reproduction accuracy. We evaluate three open-source 7B-parameter models (Qwen2, Mistral, and Gemma) using ESBMC static analysis to assess both the presence of vulnerabilities and the correctness of the generated vulnerability type. Results show all models frequently produce vulnerable outputs, with Qwen2 achieving highest correctness rates. User persona significantly affects success, where student personas achieved higher vulnerability rates than professional roles, while direct prompts were marginally more effective. Vulnerability reproduction followed an inverted-U pattern with cyclomatic complexity, peaking at moderate ranges. Our findings expose limitations of safety mechanisms in open-source models, particularly for seemingly benign educational requests.

Problem

Research questions and friction points this paper is trying to address.

LLMs generating insecure code when explicitly prompted

Assessing vulnerability reproduction accuracy in open-source LLMs

Evaluating safety mechanisms in LLMs for educational requests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Prompting with structured vulnerability templates

Reverse Prompting from real vulnerable code samples

ESBMC static analysis for vulnerability assessment

🔎 Similar Papers

No similar papers found.