🤖 AI Summary
This study addresses the critical need for real-time, non-invasive cognitive load monitoring in safety-critical scenarios, where existing approaches are often hindered by invasiveness, high cost, or poor temporal resolution. The authors propose a practical solution leveraging a standard RGB camera: facial keypoints extracted via OpenPose are combined with linear kinematic features—velocity, acceleration, and displacement—and subjected to recurrence quantification analysis to train personalized random forest classifiers. The work presents the first systematic investigation into the multiscale dynamic effects of cognitive load on facial motion, achieving high within-subject classification accuracy (73%) after only two minutes of calibration. This performance substantially outperforms conventional task-performance metrics (85% vs. 55%); however, cross-subject generalization remains limited (43%), indicating a need for further refinement in model transferability.
📝 Abstract
Real-time cognitive workload monitoring is crucial in safety-critical environments, yet established measures are intrusive, expensive, or lack temporal resolution. We tested whether facial movement dynamics from a standard webcam could provide a low-cost alternative. Seventy-two participants completed a multitasking simulation (OpenMATB) under varied load while facial keypoints were tracked via OpenPose. Linear kinematics (velocity, acceleration, displacement) and recurrence quantification features were extracted. Increasing load altered dynamics across timescales: movement magnitudes rose, temporal organisation fragmented then reorganised into complex patterns, and eye-head coordination weakened. Random forest classifiers trained on pose kinematics outperformed task performance metrics (85% vs. 55% accuracy) but generalised poorly across participants (43% vs. 33% chance). Participant-specific models reached 50% accuracy with minimal calibration (2 minutes per condition), improving continuously to 73% without plateau. Facial movement dynamics sensitively track workload with brief calibration, enabling adaptive interfaces using commodity cameras, though individual differences limit cross-participant generalisation.