Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness

📅 2025-01-10

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) ability to autonomously comprehend and explain erroneous code *in the absence of compiler error messages*, targeting improved AI-assisted debugging feedback for programming novices. We conduct the first systematic evaluation of models such as GPT-3.5 on raw buggy code inputs—without diagnostic outputs—and assess their explanatory effectiveness. Methodologically, we propose and validate a one-shot prompting strategy tailored for beginners, augmented by lightweight supervised fine-tuning. Experiments demonstrate that integrating context pruning with explicit error localization guidance significantly enhances explanation clarity and pedagogical utility. Our work establishes the first empirical benchmark and reproducible optimization framework for LLM-based programming education in compiler-error-free scenarios, directly addressing a critical methodological gap in AI support for novice debugging.

Technology Category

Application Category

📝 Abstract

Making errors is part of the programming process -- even for the most seasoned professionals. Novices in particular are bound to make many errors while learning. It is well known that traditional (compiler/interpreter) programming error messages have been less than helpful for many novices and can have effects such as being frustrating, containing confusing jargon, and being downright misleading. Recent work has found that large language models (LLMs) can generate excellent error explanations, but that the effectiveness of these error messages heavily depends on whether the LLM has been provided with context -- typically the original source code where the problem occurred. Knowing that programming error messages can be misleading and/or contain that serves little-to-no use (particularly for novices) we explore the reverse: what happens when GPT-3.5 is prompted for error explanations on just the erroneous source code itself -- original compiler/interpreter produced error message excluded. We utilized various strategies to make more effective error explanations, including one-shot prompting and fine-tuning. We report the baseline results of how effective the error explanations are at providing feedback, as well as how various prompting strategies might improve the explanations' effectiveness. Our results can help educators by understanding how LLMs respond to such prompts that novices are bound to make, and hopefully lead to more effective use of Generative AI in the classroom.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Error Understanding

Programming Mistakes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Programming Error Detection

Educational Implications

🔎 Similar Papers

ChatDBG: An AI-Powered Debugging Assistant