🤖 AI Summary
Current automated scoring systems rely heavily on large-scale annotated datasets, provide generic feedback, and overemphasize numerical scores at the expense of pedagogical experience—undermining scoring fairness and instructional effectiveness. This paper introduces the first zero-shot large language model (LLM)-based scoring framework explicitly designed for educational experience: it requires no fine-tuning or in-context examples. Leveraging structured prompt engineering, it simultaneously evaluates both computational correctness and explanatory reasoning in student responses, while generating actionable, personalized learning feedback. The approach eliminates dependence on labeled data and human-constructed rubrics, preserving scoring consistency while enhancing learning supportiveness. Empirical evaluation in university courses demonstrates significant improvements in scoring accuracy, depth of student understanding, learning motivation, and pre-class preparation—enabling scalable, high-quality instructional feedback.
📝 Abstract
Automated grading has become an essential tool in education technology due to its ability to efficiently assess large volumes of student work, provide consistent and unbiased evaluations, and deliver immediate feedback to enhance learning. However, current systems face significant limitations, including the need for large datasets in few-shot learning methods, a lack of personalized and actionable feedback, and an overemphasis on benchmark performance rather than student experience. To address these challenges, we propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system. This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning. The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes. Our study demonstrates the system's effectiveness through comprehensive evaluations, including survey responses from higher education students that indicate significant improvements in motivation, understanding, and preparedness compared to traditional grading methods. The results validate the AAG system's potential to transform educational assessment by prioritizing learning experiences and providing scalable, high-quality feedback.