Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study investigates how prompt engineering paradigms—Zero-Shot, Chain-of-Thought (CoT), and Few-Shot—affect the maintainability, security, and reliability of code generated by ChatGPT. Method: Leveraging 7,583 code samples from the Dev-GPT dataset, we conduct static code quality analysis and apply the Kruskal-Wallis nonparametric test to systematically assess associations between prompt structures and multidimensional code quality metrics—an approach unprecedented in prior work. Contribution/Results: No statistically significant differences in code quality across the three dimensions were found among the prompt types. Zero-Shot prompting exhibits the broadest adoption and the lowest overall defect rate; structured prompts (CoT and Few-Shot) yield no empirically supported quality improvement. These findings challenge the prevalent assumption that increased prompt complexity inherently enhances code quality, providing an evidence-based benchmark and methodological guidance for prompt engineering in large language model–based code generation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have rapidly transformed software development, especially in code generation. However, their inconsistent performance, prone to hallucinations and quality issues, complicates program comprehension and hinders maintainability. Research indicates that prompt engineering-the practice of designing inputs to direct LLMs toward generating relevant outputs-may help address these challenges. In this regard, researchers have introduced prompt patterns, structured templates intended to guide users in formulating their requests. However, the influence of prompt patterns on code quality has yet to be thoroughly investigated. An improved understanding of this relationship would be essential to advancing our collective knowledge on how to effectively use LLMs for code generation, thereby enhancing their understandability in contemporary software development. This paper empirically investigates the impact of prompt patterns on code quality, specifically maintainability, security, and reliability, using the Dev-GPT dataset. Results show that Zero-Shot prompting is most common, followed by Zero-Shot with Chain-of-Thought and Few-Shot. Analysis of 7583 code files across quality metrics revealed minimal issues, with Kruskal-Wallis tests indicating no significant differences among patterns, suggesting that prompt structure may not substantially impact these quality metrics in ChatGPT-assisted code generation.

Problem

Research questions and friction points this paper is trying to address.

Investigates how prompt patterns affect ChatGPT-generated code quality

Assesses impact on maintainability, security, and reliability of LLM code

Examines if prompt engineering improves consistency in code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical assessment of ChatGPT-generated code quality

Prompt patterns impact on maintainability, security, reliability

Zero-Shot prompting most common in code generation

🔎 Similar Papers

No similar papers found.