๐ค AI Summary
Large-scale teaching evaluation in engineering education faces challenges including difficulty integrating qualitative student feedback, privacy sensitivity, and low operationalizability. To address these, this study proposes a scalable, AI-augmented framework that innovatively integrates large language models (LLMs) with hierarchical summarization, differential privacyโdriven anonymization, anomaly detection, and percentile-based visualization analytics for multi-source collaborative analysis of student, peer, and instructor reflective data. The framework prioritizes formative assessment goals and embeds ethical safeguards to generate high-fidelity, privacy-preserving pedagogical improvement recommendations. Deployed in a large engineering school, empirical validation shows 92% agreement between LLM-generated summaries and human expert reviews, high faculty adoption rates, and longitudinal evidence of sustained support for instructional refinement and professional development.
๐ Abstract
Evaluating teaching effectiveness at scale remains a persistent challenge for large universities, particularly within engineering programs that enroll tens of thousands of students. Traditional methods, such as manual review of student evaluations, are often impractical, leading to overlooked insights and inconsistent data use. This article presents a scalable, AI-supported framework for synthesizing qualitative student feedback using large language models. The system employs hierarchical summarization, anonymization, and exception handling to extract actionable themes from open-ended comments while upholding ethical safeguards. Visual analytics contextualize numeric scores through percentile-based comparisons, historical trends, and instructional load. The approach supports meaningful evaluation and aligns with best practices in qualitative analysis and educational assessment, incorporating student, peer, and self-reflective inputs without automating personnel decisions. We report on its successful deployment across a large college of engineering. Preliminary validation through comparisons with human reviewers, faculty feedback, and longitudinal analysis suggests that LLM-generated summaries can reliably support formative evaluation and professional development. This work demonstrates how AI systems, when designed with transparency and shared governance, can promote teaching excellence and continuous improvement at scale within academic institutions.