Security Degradation in Iterative AI Code Generation - A Systematic Analysis of the Paradox

📅 2025-05-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study uncovers a paradox in iterative code generation by large language models (LLMs): security vulnerabilities exhibit non-monotonic evolution—risk intensifies with successive feedback rounds. Method: Conducting 40 controlled iterations across 400 code samples under four distinct prompting strategies, we employed Semgrep and Bandit for static and dynamic vulnerability scanning, complemented by hierarchical vulnerability clustering. Contribution/Results: We provide the first empirical evidence that critical vulnerabilities increase by 37.6% after just five iterations, revealing prompting-strategy-specific vulnerability patterns. Based on these findings, we propose a “human-in-the-loop verification” framework mandating manual security review between every iteration. Our work challenges the prevailing assumption that iterative refinement inherently improves security and delivers actionable, process-integrated governance guidelines for LLM-assisted software development.

Technology Category

Application Category

📝 Abstract
The rapid adoption of Large Language Models(LLMs) for code generation has transformed software development, yet little attention has been given to how security vulnerabilities evolve through iterative LLM feedback. This paper analyzes security degradation in AI-generated code through a controlled experiment with 400 code samples across 40 rounds of"improvements"using four distinct prompting strategies. Our findings show a 37.6% increase in critical vulnerabilities after just five iterations, with distinct vulnerability patterns emerging across different prompting approaches. This evidence challenges the assumption that iterative LLM refinement improves code security and highlights the essential role of human expertise in the loop. We propose practical guidelines for developers to mitigate these risks, emphasizing the need for robust human validation between LLM iterations to prevent the paradoxical introduction of new security issues during supposedly beneficial code"improvements".
Problem

Research questions and friction points this paper is trying to address.

Analyzes security degradation in AI-generated code through iterative improvements
Identifies increased critical vulnerabilities across different LLM prompting strategies
Challenges assumption that iterative LLM refinement enhances code security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically analyzes security degradation in iterative AI code generation
Proposes human validation guidelines between LLM iterations
Challenges assumption that iterative refinement improves code security
S
Shivani Shukla
Department of Analytics and Information Systems, University of San Francisco, San Francisco, United States
Himanshu Joshi
Himanshu Joshi
Indian Institute of Technology Hyderabad
DNA NanotechnologyBiophysicsNanopores.
R
Romilla Syed
Department of Management Science and Information Systems, University of Massachusetts Boston, Boston, United States