Generating Planning Feedback for Open-Ended Programming Exercises with LLMs

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-ended programming exercises rely on test-case-based automated grading, which fails to provide feedback on students’ problem-solving plans—such as divide-and-conquer or traversal strategies—especially when submitted code contains syntax errors. Method: This paper introduces the first large language model (LLM)-based approach for programming plan recognition, capable of inferring high-level, abstract solution logic from syntactically erroneous code. We employ plan-pattern-oriented prompt engineering and fine-tune GPT-4o-mini to explicitly recover implicit planning structures. Contribution/Results: Our method achieves plan recognition accuracy comparable to GPT-4o while significantly outperforming traditional static analysis baselines. It enables lightweight, real-time pedagogical feedback, advancing LLM applications in programming education from the “code execution layer” to the “cognitive planning layer.” This bridges a critical gap in formative assessment of computational thinking and strategic reasoning.

Technology Category

Application Category

📝 Abstract
To complete an open-ended programming exercise, students need to both plan a high-level solution and implement it using the appropriate syntax. However, these problems are often autograded on the correctness of the final submission through test cases, and students cannot get feedback on their planning process. Large language models (LLM) may be able to generate this feedback by detecting the overall code structure even for submissions with syntax errors. To this end, we propose an approach that detects which high-level goals and patterns (i.e. programming plans) exist in a student program with LLMs. We show that both the full GPT-4o model and a small variant (GPT-4o-mini) can detect these plans with remarkable accuracy, outperforming baselines inspired by conventional approaches to code analysis. We further show that the smaller, cost-effective variant (GPT-4o-mini) achieves results on par with state-of-the-art (GPT-4o) after fine-tuning, creating promising implications for smaller models for real-time grading. These smaller models can be incorporated into autograders for open-ended code-writing exercises to provide feedback for students' implicit planning skills, even when their program is syntactically incorrect. Furthermore, LLMs may be useful in providing feedback for problems in other domains where students start with a set of high-level solution steps and iteratively compute the output, such as math and physics problems.
Problem

Research questions and friction points this paper is trying to address.

Providing feedback on planning process in programming exercises
Detecting high-level goals in student code using LLMs
Enabling real-time grading with cost-effective smaller models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to detect high-level programming plans
Fine-tunes smaller models for cost-effective feedback
Provides feedback on planning despite syntax errors
🔎 Similar Papers
No similar papers found.
M
Mehmet Arif Demirtaş
University of Illinois Urbana-Champaign, Urbana IL 61801, USA
C
Claire Zheng
University of Illinois Urbana-Champaign, Urbana IL 61801, USA
Max Fowler
Max Fowler
Assistant Teaching Professor, University of Illinois
Computer Science Education
K
Kathryn Cunningham
University of Illinois Urbana-Champaign, Urbana IL 61801, USA