Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the high cost and scalability limitations of manual coding for classroom discourse quality assessment. We propose the first automated, lesson-segment-level evaluation method grounded in the Global Teaching Insights (GTI) framework’s three core dimensions: “nature of discourse,” “questioning,” and “explanation.” Our approach introduces a text-centric, attention-driven multimodal fusion model that integrates transcribed speech, acoustic features, and visual cues from instructional videos. Discourse quality estimation is formulated as an ordinal classification task, with joint prediction of all three dimensions enabled via multi-task learning. Evaluated on a dataset of 92 German mathematics lessons, our model achieves a weighted Kappa of 0.384—surpassing inter-rater reliability among human coders (0.326). Results validate the efficacy of a text-dominant, acoustically enhanced paradigm and establish a scalable, interpretable technical pathway for large-scale classroom discourse analysis.

Technology Category

Application Category

📝 Abstract
Classroom discourse is an essential vehicle through which teaching and learning take place. Assessing different characteristics of discursive practices and linking them to student learning achievement enhances the understanding of teaching quality. Traditional assessments rely on manual coding of classroom observation protocols, which is time-consuming and costly. Despite many studies utilizing AI techniques to analyze classroom discourse at the utterance level, investigations into the evaluation of discursive practices throughout an entire lesson segment remain limited. To address this gap, our study proposes a novel text-centered multimodal fusion architecture to assess the quality of three discourse components grounded in the Global Teaching InSights (GTI) observation protocol: Nature of Discourse, Questioning, and Explanations. First, we employ attention mechanisms to capture inter- and intra-modal interactions from transcript, audio, and video streams. Second, a multi-task learning approach is adopted to jointly predict the quality scores of the three components. Third, we formulate the task as an ordinal classification problem to account for rating level order. The effectiveness of these designed elements is demonstrated through an ablation study on the GTI Germany dataset containing 92 videotaped math lessons. Our results highlight the dominant role of text modality in approaching this task. Integrating acoustic features enhances the model's consistency with human ratings, achieving an overall Quadratic Weighted Kappa score of 0.384, comparable to human inter-rater reliability (0.326). Our study lays the groundwork for the future development of automated discourse quality assessment to support teacher professional development through timely feedback on multidimensional discourse practices.
Problem

Research questions and friction points this paper is trying to address.

Assessing classroom discourse quality automatically to replace manual coding
Evaluating multiple discourse components (Nature, Questioning, Explanations) jointly
Integrating multimodal data (text, audio, video) for improved assessment accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention mechanisms capture multimodal interactions
Multi-task learning predicts three discourse components
Ordinal classification accounts for rating level order
🔎 Similar Papers
No similar papers found.