Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates large language models’ (LLMs) ability to autonomously comprehend and explain erroneous code *in the absence of compiler error messages*, targeting improved AI-assisted debugging feedback for programming novices. We conduct the first systematic evaluation of models such as GPT-3.5 on raw buggy code inputs—without diagnostic outputs—and assess their explanatory effectiveness. Methodologically, we propose and validate a one-shot prompting strategy tailored for beginners, augmented by lightweight supervised fine-tuning. Experiments demonstrate that integrating context pruning with explicit error localization guidance significantly enhances explanation clarity and pedagogical utility. Our work establishes the first empirical benchmark and reproducible optimization framework for LLM-based programming education in compiler-error-free scenarios, directly addressing a critical methodological gap in AI support for novice debugging.

Technology Category

Application Category

📝 Abstract
Making errors is part of the programming process -- even for the most seasoned professionals. Novices in particular are bound to make many errors while learning. It is well known that traditional (compiler/interpreter) programming error messages have been less than helpful for many novices and can have effects such as being frustrating, containing confusing jargon, and being downright misleading. Recent work has found that large language models (LLMs) can generate excellent error explanations, but that the effectiveness of these error messages heavily depends on whether the LLM has been provided with context -- typically the original source code where the problem occurred. Knowing that programming error messages can be misleading and/or contain that serves little-to-no use (particularly for novices) we explore the reverse: what happens when GPT-3.5 is prompted for error explanations on just the erroneous source code itself -- original compiler/interpreter produced error message excluded. We utilized various strategies to make more effective error explanations, including one-shot prompting and fine-tuning. We report the baseline results of how effective the error explanations are at providing feedback, as well as how various prompting strategies might improve the explanations' effectiveness. Our results can help educators by understanding how LLMs respond to such prompts that novices are bound to make, and hopefully lead to more effective use of Generative AI in the classroom.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Error Understanding
Programming Mistakes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Programming Error Detection
Educational Implications
🔎 Similar Papers
No similar papers found.
A
Audrey Salmon
Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
K
Katie Hammer
School of Computer Science, North Carolina State University, Raleigh, USA
Eddie Antonio Santos
Eddie Antonio Santos
University College Dublin
computing educationprogramming error messageslanguage reclamation
Brett A. Becker
Brett A. Becker
University College Dublin
Computing EducationComputer Science EducationInformatics Education