đ¤ AI Summary
This study addresses the challenge of automatically extracting interpretable, pedagogically grounded feedback indicators from student written assignments in language learning courses to support high-quality formative assessment. Methodologically, it systematically investigatesâwithout fine-tuningâthe capability of Llama 3.1 to extract multi-dimensional feedback indicators (e.g., grammatical accuracy, pragmatic appropriateness, logical coherence) aligned with teacher-defined rubrics. Indicator-level agreement with human expert annotations is quantitatively evaluated. Results demonstrate statistically significant strong correlations across all dimensions (r > 0.85, p < 0.001) and robust generalization to unseen indicator combinations. This work establishes a verifiable, interpretable, and pedagogically aligned paradigm for LLM-driven automated feedback generation, enhancing both efficiency and reliability of formative assessment in language education.
đ Abstract
Automated feedback generation has the potential to enhance students' learning progress by providing timely and targeted feedback. Moreover, it can assist teachers in optimizing their time, allowing them to focus on more strategic and personalized aspects of teaching. To generate high-quality, information-rich formative feedback, it is essential first to extract relevant indicators, as these serve as the foundation upon which the feedback is constructed. Teachers often employ feedback criteria grids composed of various indicators that they evaluate systematically. This study examines the initial phase of extracting such indicators from students' submissions of a language learning course using the large language model Llama 3.1. Accordingly, the alignment between indicators generated by the LLM and human ratings across various feedback criteria is investigated. The findings demonstrate statistically significant strong correlations, even in cases involving unanticipated combinations of indicators and criteria. The methodology employed in this paper offers a promising foundation for extracting indicators from students' submissions using LLMs. Such indicators can potentially be utilized to auto-generate explainable and transparent formative feedback in future research.