Prompting in the Wild: An Empirical Study of Prompt Evolution in Software Repositories

📅 2024-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of empirical understanding of how large language model (LLM) prompts evolve in real-world software development hinders reliable LLM integration. Method: We conduct the first empirical study on prompt evolution, analyzing 1,262 prompt modifications across 243 GitHub repositories. We systematically characterize modification frequency, coupling with code changes, documentation coverage, and associated engineering challenges. Contribution/Results: We find that prompt modifications predominantly occur during feature development and are mostly additions or edits—not deletions; only 21.9% are documented, leading to frequent logical inconsistencies and LLM response mismatches. We identify three core engineering challenges—prompt fragility, inadequate validation, and poor traceability—and propose dedicated prompt testing, automated verification mechanisms, and standardized documentation practices. This work provides the first empirical foundation and actionable engineering guidelines for ensuring reliability in LLM-augmented software systems.

Technology Category

Application Category

📝 Abstract
The adoption of Large Language Models (LLMs) is reshaping software development as developers integrate these LLMs into their applications. In such applications, prompts serve as the primary means of interacting with LLMs. Despite the widespread use of LLM-integrated applications, there is limited understanding of how developers manage and evolve prompts. This study presents the first empirical analysis of prompt evolution in LLM-integrated software development. We analyzed 1,262 prompt changes across 243 GitHub repositories to investigate the patterns and frequencies of prompt changes, their relationship with code changes, documentation practices, and their impact on system behavior. Our findings show that developers primarily evolve prompts through additions and modifications, with most changes occurring during feature development. We identified key challenges in prompt engineering: only 21.9% of prompt changes are documented in commit messages, changes can introduce logical inconsistencies, and misalignment often occurs between prompt changes and LLM responses. These insights emphasize the need for specialized testing frameworks, automated validation tools, and improved documentation practices to enhance the reliability of LLM-integrated applications.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Prompt Engineering
Software Development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models (LLM) in Software Development
Prompt Engineering Challenges
Reliability and Efficiency Improvements
🔎 Similar Papers
No similar papers found.