🤖 AI Summary
This study addresses the bottleneck of high-cost manual annotation in automated teacher visual attention detection for classroom behavior analysis. We propose a light-labeling automation pipeline that integrates a domain-adapted classroom face recognition model—built upon YOLO/RetinaFace detectors and FaceNet-based feature embedding—with mobile eye-tracking data to construct a gaze-to-student mapping algorithm, enabling non-intrusive, low-annotation-cost analysis of teacher attention distribution. Our key contribution lies in the first joint modeling of domain-adaptive face recognition and dynamic gaze trajectories, substantially reducing annotation requirements. Evaluated across four real-world classroom settings, the method achieves average detection accuracy of 0.7–0.9, with strongest performance in U-shaped and small classrooms. The framework provides scalable technical support for student engagement assessment and teacher professional development.
📝 Abstract
Teachers' visual attention and its distribution across the students in classrooms can constitute important implications for student engagement, achievement, and professional teacher training. Despite that, inferring the information about where and which student teachers focus on is not trivial. Mobile eye tracking can provide vital help to solve this issue; however, the use of mobile eye tracking alone requires a significant amount of manual annotations. To address this limitation, we present an automated processing pipeline concept that requires minimal manually annotated data to recognize which student the teachers focus on. To this end, we utilize state-of-the-art face detection models and face recognition feature embeddings to train face recognition models with transfer learning in the classroom context and combine these models with the teachers' gaze from mobile eye trackers. We evaluated our approach with data collected from four different classrooms, and our results show that while it is possible to estimate the visually focused students with reasonable performance in all of our classroom setups, U-shaped and small classrooms led to the best results with accuracies of approximately 0.7 and 0.9, respectively. While we did not evaluate our method for teacher-student interactions and focused on the validity of the technical approach, as our methodology does not require a vast amount of manually annotated data and offers a non-intrusive way of handling teachers' visual attention, it could help improve instructional strategies, enhance classroom management, and provide feedback for professional teacher development.