🤖 AI Summary
Current LLM-based automated evaluation of undergraduate thesis essays (UGTEs) yields only a single holistic score, failing to capture pedagogically critical dimensions—such as structural coherence, alignment with learning objectives, and multifaceted academic competencies. To address this, we propose SLOWPR, a teaching-oriented, fine-grained assessment framework. SLOWPR is the first to integrate Vygotsky’s Zone of Proximal Development and Bloom’s Taxonomy into prompt engineering, establishing a six-dimensional rubric covering structure, logic, domain knowledge, writing quality, reflective depth, and academic integrity. Leveraging hierarchical role-playing and few-shot in-context learning, it achieves alignment between large language models and pedagogical expertise without model fine-tuning. Experiments demonstrate strong inter-rater agreement between SLOWPR and human experts across all dimensions (mean Cohen’s κ = 0.82), significantly enhancing both the pedagogical validity and explanatory granularity of automated thesis evaluation.
📝 Abstract
The undergraduate thesis (UGTE) plays an indispensable role in assessing a student's cumulative academic development throughout their college years. Although large language models (LLMs) have advanced education intelligence, they typically focus on holistic assessment with only one single evaluation score, but ignore the intricate nuances across multifaceted criteria, limiting their ability to reflect structural criteria, pedagogical objectives, and diverse academic competencies. Meanwhile, pedagogical theories have long informed manual UGTE evaluation through multi-dimensional assessment of cognitive development, disciplinary thinking, and academic performance, yet remain underutilized in automated settings. Motivated by the research gap, we pioneer PEMUTA, a pedagogically-enriched framework that effectively activates domain-specific knowledge from LLMs for multi-granular UGTE assessment. Guided by Vygotsky's theory and Bloom's Taxonomy, PEMUTA incorporates a hierarchical prompting scheme that evaluates UGTEs across six fine-grained dimensions: Structure, Logic, Originality, Writing, Proficiency, and Rigor (SLOWPR), followed by holistic synthesis. Two in-context learning techniques, ie, few-shot prompting and role-play prompting, are also incorporated to further enhance alignment with expert judgments without fine-tuning. We curate a dataset of authentic UGTEs with expert-provided SLOWPR-aligned annotations to support multi-granular UGTE assessment. Extensive experiments demonstrate that PEMUTA achieves strong alignment with expert evaluations, and exhibits strong potential for fine-grained, pedagogically-informed UGTE evaluations.