Security Degradation in Iterative AI Code Generation - A Systematic Analysis of the Paradox

📅 2025-05-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study uncovers a paradox in iterative code generation by large language models (LLMs): security vulnerabilities exhibit non-monotonic evolution—risk intensifies with successive feedback rounds. Method: Conducting 40 controlled iterations across 400 code samples under four distinct prompting strategies, we employed Semgrep and Bandit for static and dynamic vulnerability scanning, complemented by hierarchical vulnerability clustering. Contribution/Results: We provide the first empirical evidence that critical vulnerabilities increase by 37.6% after just five iterations, revealing prompting-strategy-specific vulnerability patterns. Based on these findings, we propose a “human-in-the-loop verification” framework mandating manual security review between every iteration. Our work challenges the prevailing assumption that iterative refinement inherently improves security and delivers actionable, process-integrated governance guidelines for LLM-assisted software development.

Technology Category

Application Category

📝 Abstract

The rapid adoption of Large Language Models(LLMs) for code generation has transformed software development, yet little attention has been given to how security vulnerabilities evolve through iterative LLM feedback. This paper analyzes security degradation in AI-generated code through a controlled experiment with 400 code samples across 40 rounds of"improvements"using four distinct prompting strategies. Our findings show a 37.6% increase in critical vulnerabilities after just five iterations, with distinct vulnerability patterns emerging across different prompting approaches. This evidence challenges the assumption that iterative LLM refinement improves code security and highlights the essential role of human expertise in the loop. We propose practical guidelines for developers to mitigate these risks, emphasizing the need for robust human validation between LLM iterations to prevent the paradoxical introduction of new security issues during supposedly beneficial code"improvements".

Problem

Research questions and friction points this paper is trying to address.

Analyzes security degradation in AI-generated code through iterative improvements

Identifies increased critical vulnerabilities across different LLM prompting strategies

Challenges assumption that iterative LLM refinement enhances code security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically analyzes security degradation in iterative AI code generation

Proposes human validation guidelines between LLM iterations

Challenges assumption that iterative refinement improves code security

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?