🤖 AI Summary
Large language models (LLMs) in software development heavily rely on “developer prompts” (Dev Prompts), yet these prompts frequently exhibit critical flaws—including bias, susceptibility to prompt injection attacks, and suboptimal performance. This paper introduces PromptDoctor, the first automated detection and repair framework specifically designed for Dev Prompts. Inspired by software linting, it establishes an empirically grounded, multidimensional quality assessment and remediation paradigm. Our approach integrates a rule-based engine, LLM-driven prompt rewriting, adversarial testing, and heuristic optimization, augmented with NLP and program analysis techniques, and is delivered as a real-time VS Code extension. Evaluated on 2,173 real-world Dev Prompts, PromptDoctor achieves a 68.29% debiasing rate, a 41.81% improvement in injection resistance, and a 37.1% average performance gain. Both the tool and benchmark dataset are publicly released.
📝 Abstract
The tidal wave of advancements in Large Language Models (LLMs) has led to their swift integration into application-level logic. Many software systems now use prompts to interact with these black-box models, combining natural language with dynamic values interpolated at runtime, to perform tasks ranging from sentiment analysis to question answering. Due to the programmatic and structured natural language aspects of these prompts, we refer to them as Developer Prompts. Unlike traditional software artifacts, Dev Prompts blend natural language instructions with artificial languages such as programming and markup languages, thus requiring specialized tools for analysis, distinct from classical software evaluation methods. In response to this need, we introduce PromptDoctor, a tool explicitly designed to detect and correct issues of Dev Prompts. PromptDoctor identifies and addresses problems related to bias, vulnerability, and sub-optimal performance in Dev Prompts, helping mitigate their possible harms. In our analysis of 2,173 Dev Prompts, selected as a representative sample of 40,573 Dev Prompts, we found that 3.46% contained one or more forms of bias, 10.75% were vulnerable to prompt injection attacks. Additionally, 3,310 were amenable to automated prompt optimization. To address these issues, we applied PromptDoctor to the flawed Dev Prompts we discovered. PromptDoctor de-biased 68.29% of the biased Dev Prompts, hardened 41.81% of the vulnerable Dev Prompts, and improved the performance of 37.1% sub-optimal Dev Prompts. Finally, we developed a PromptDoctor VSCode extension, enabling developers to easily enhance Dev Prompts in their existing development workflows. The data and source code for this work are available at