🤖 AI Summary
This study addresses smartphone-induced attention distraction in online learning by proposing a non-intrusive, multimodal real-time detection method. We introduce the first integration of heterogeneous signals—electroencephalography (EEG), photoplethysmography (PPG)-derived heart rate, and head pose—within a lightweight deep learning framework combining LSTM and attention mechanisms, augmented by multimodal feature alignment and physiological signal processing techniques. The model achieves 91% overall accuracy, significantly surpassing unimodal baselines (head pose: 87%; EEG/PPG: <75%), thereby breaking the unimodal accuracy bottleneck. It supports low-latency edge deployment and has been validated in authentic online classroom settings. Key contributions include: (i) a cross-modal physiological–behavioral signal fusion paradigm specifically designed for educational contexts; and (ii) a practical, lightweight, real-time distraction detection framework suitable for resource-constrained edge devices.
📝 Abstract
This work investigates the use of multimodal biometrics to detect distractions caused by smartphone use during tasks that require sustained attention, with a focus on computer-based online learning. Although the methods are applicable to various domains, such as autonomous driving, we concentrate on the challenges learners face in maintaining engagement amid internal (e.g., motivation), system-related (e.g., course design) and contextual (e.g., smartphone use) factors. Traditional learning platforms often lack detailed behavioral data, but Multimodal Learning Analytics (MMLA) and biosensors provide new insights into learner attention. We propose an AI-based approach that leverages physiological signals and head pose data to detect phone use. Our results show that single biometric signals, such as brain waves or heart rate, offer limited accuracy, while head pose alone achieves 87%. A multimodal model combining all signals reaches 91% accuracy, highlighting the benefits of integration. We conclude by discussing the implications and limitations of deploying these models for real-time support in online learning environments.