🤖 AI Summary
Existing methods for predicting student classroom engagement rely heavily on large amounts of labeled data and overlook the contextual influence of peer behaviors, limiting their applicability in privacy-sensitive and behaviorally diverse real-world teaching environments. This work proposes a three-stage framework: first, a vision-language model (VLM) is fine-tuned with few-shot learning to recognize individual student actions; second, input videos are segmented into action sequences using a sliding time window; and third, peer behaviors are incorporated as contextual cues, enabling a large language model (LLM) to classify engagement levels over the entire sequence. By integrating peer context into engagement modeling and synergizing the VLM’s few-shot perception with the LLM’s sequential reasoning capabilities, the proposed approach significantly improves prediction accuracy in low-label, high-complexity scenarios.
📝 Abstract
Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require substantial annotated data to model the diversity of student behaviors, yet privacy concerns often restrict researchers to their own proprietary datasets. Moreover, the classroom context, represented in peers' actions, is ignored. To address the aforementioned limitation, we propose a novel three-stage framework for video-based student engagement measurement. First, we explore the few-shot adaptation of the vision-language model for student action recognition, which is fine-tuned to distinguish among action categories with a few training samples. Second, to handle continuous and unpredictable student actions, we utilize the sliding temporal window technique to divide each student's 2-minute-long video into non-overlapping segments. Each segment is assigned an action category via the fine-tuned VLM model, generating a sequence of action predictions. Finally, we leverage the large language model to classify this entire sequence of actions, together with the classroom context, as belonging to an engaged or disengaged student. The experimental results demonstrate the effectiveness of the proposed approach in identifying student engagement.