Mark My Works Autograder for Programming Courses

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of delivering timely and personalized feedback in large-scale programming courses. To this end, the authors propose a locally deployed automated grading system that uniquely integrates role-based prompt engineering with large language models (LLMs). The system validates functional correctness through unit tests and leverages LLMs to generate interpretable, pedagogically oriented feedback on code quality while maintaining transparency in its reasoning process. In a pilot deployment involving 191 students, the AI-generated scores showed no significant linear correlation with human grades (r = −0.177) but exhibited a similar distribution shape. Although the AI scoring was notably more conservative (mean = 59.95 vs. 80.53), it substantially outperformed human graders in coverage of technical details and depth of feedback.

Technology Category

Application Category

📝 Abstract
Large programming courses struggle to provide timely, detailed feedback on student code. We developed Mark My Works, a local autograding system that combines traditional unit testing with LLM-generated explanations. The system uses role-based prompts to analyze submissions, critique code quality, and generate pedagogical feedback while maintaining transparency in its reasoning process. We piloted the system in a 191-student engineering course, comparing AI-generated assessments with human grading on 79 submissions. While AI scores showed no linear correlation with human scores (r = -0.177, p = 0.124), both systems exhibited similar left-skewed distributions, suggesting they recognize comparable quality hierarchies despite different scoring philosophies. The AI system demonstrated more conservative scoring (mean: 59.95 vs 80.53 human) but generated significantly more detailed technical feedback.
Problem

Research questions and friction points this paper is trying to address.

autograding
programming education
student feedback
code assessment
large-scale courses
Innovation

Methods, ideas, or system contributions that make the work stand out.

autograding
LLM-generated feedback
role-based prompting
pedagogical feedback
code quality assessment
🔎 Similar Papers
No similar papers found.
Y
Yiding Qiu
School of Engineering and Technology, UNSW Canberra, Australia
S
Seyed Mahdi Azimi
School of Engineering and Technology, UNSW Canberra, Australia
Artem Lensky
Artem Lensky
UNSW Canberra at the Australian Defence Force Academy
DefenceArtificial IntelligenceQuantitative FinanceMedical Data AnalysisEducation