Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification

📅 2026-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for predicting student classroom engagement rely heavily on large amounts of labeled data and overlook the contextual influence of peer behaviors, limiting their applicability in privacy-sensitive and behaviorally diverse real-world teaching environments. This work proposes a three-stage framework: first, a vision-language model (VLM) is fine-tuned with few-shot learning to recognize individual student actions; second, input videos are segmented into action sequences using a sliding time window; and third, peer behaviors are incorporated as contextual cues, enabling a large language model (LLM) to classify engagement levels over the entire sequence. By integrating peer context into engagement modeling and synergizing the VLM’s few-shot perception with the LLM’s sequential reasoning capabilities, the proposed approach significantly improves prediction accuracy in low-label, high-complexity scenarios.

Technology Category

Application Category

📝 Abstract
Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require substantial annotated data to model the diversity of student behaviors, yet privacy concerns often restrict researchers to their own proprietary datasets. Moreover, the classroom context, represented in peers' actions, is ignored. To address the aforementioned limitation, we propose a novel three-stage framework for video-based student engagement measurement. First, we explore the few-shot adaptation of the vision-language model for student action recognition, which is fine-tuned to distinguish among action categories with a few training samples. Second, to handle continuous and unpredictable student actions, we utilize the sliding temporal window technique to divide each student's 2-minute-long video into non-overlapping segments. Each segment is assigned an action category via the fine-tuned VLM model, generating a sequence of action predictions. Finally, we leverage the large language model to classify this entire sequence of actions, together with the classroom context, as belonging to an engaged or disengaged student. The experimental results demonstrate the effectiveness of the proposed approach in identifying student engagement.
Problem

Research questions and friction points this paper is trying to address.

student engagement
classroom context
peer behavior
behavioral measurement
privacy constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model
few-shot adaptation
large language model
behavioral engagement
classroom context
🔎 Similar Papers
No similar papers found.
A
Ahmed Abdelkawy
University of Louisville, Louisville, KY, USA
Ahmed Elsayed
Ahmed Elsayed
Computational Mechanics, The Technical University of Munich
Optimization
A
Asem Ali
University of Louisville, Louisville, KY, USA
Aly Farag
Aly Farag
Professor of Electrical and Computer Engineering, University of Louisville
Computer VisionMedical ImagingBiometrics
T
Thomas Tretter
University of Louisville, Louisville, KY, USA
M
Michael McIntyre
University of Louisville, Louisville, KY, USA