🤖 AI Summary
This study addresses the high cost and labor intensity of manual annotation in multimodal eye-tracking and video data of child–caregiver interactions, which hinders large-scale or longitudinal research. To overcome this limitation, the authors propose a deep learning–based toolkit that, for the first time, integrates three core functionalities: multi-video post-synchronization, semi-automatic gaze target categorization, and classification of participant pose and hand movements. Designed for naturalistic settings, this framework enables efficient analysis of dynamic attention patterns. By substantially improving annotation efficiency and data scalability, the method provides the first end-to-end solution for multimodal feature extraction in early developmental research.
📝 Abstract
Video recordings of child-caregiver interactions enable investigation of attentional dynamics during naturalistic behavior. Such multimodal recording also allows researchers to examine how attention interacts with action and language use in real time. However, manual annotation of such data is time-consuming. Here, we introduce GazeBehavior Annotation Toolkit, a deep-learning-based toolkit designed to facilitate three key processes in data preprocessing and feature extraction: post-hoc synchronization across multiple videos, semi-automatic annotation of gaze target categories, and categorization of participants' poses and hand actions. This toolkit improves the efficiency and scalability of feature extraction from human egocentric eye-tracking and video data. Such improvement is critical in supporting large-scale and longitudinal investigations of attentional dynamics and naturalistic behavior in human early development.