🤖 AI Summary
This work addresses the scarcity of ecologically valid multimodal datasets that hinders in-depth analysis of students’ oral presentation skills and the development of automated feedback systems. To bridge this gap, we introduce the SOPHIAS dataset, collected in authentic classroom settings, comprising 50 oral presentations and subsequent Q&A sessions delivered by 65 undergraduate students. The dataset integrates eight synchronized sensor modalities—including high-definition audio-video, eye-tracking, physiological signals (via smartwatches), interaction logs (keyboard, mouse, and clicker data), and presentation slides—and is annotated with standardized evaluations from instructors, peers, and self-assessments. Spanning approximately 12 hours of recordings, SOPHIAS is publicly available on GitHub and the Science Data Bank, offering the first high-ecological-validity benchmark resource for multimodal learning analytics, automated feedback generation, and peer assessment research.
📝 Abstract
Oral presentation skills are a critical component of higher education, yet comprehensive datasets capturing real-world student performance across multiple modalities remain scarce. To address this gap, we present SOPHIAS (Student Oral Presentation monitoring for Holistic Insights&Analytics using Sensors), a 12-hour multimodal dataset containing recordings of 50 oral presentations (10-15-minute presentation followed by 5-15-minute Q&A) delivered by 65 undergraduate and master's students at the Universidad Autonoma de Madrid. SOPHIAS integrates eight synchronized sensor streams from high-definition webcams, ambient and webcam audio, eye-tracking glasses, smartwatch physiological sensors, and clicker, keyboard, and mouse interactions. In addition, the dataset includes slides and rubric-based evaluations from teachers, peers, and self-assessments, along with timestamped contextual annotations. The dataset captures presentations conducted in real classroom settings, preserving authentic student behaviors, interactions, and physiological responses. SOPHIAS enables the exploration of relationships between multimodal behavioral and physiological signals and presentation performance, supports the study of peer assessment, and provides a benchmark for developing automated feedback and Multimodal Learning Analytics tools. The dataset is publicly available for research through GitHub and Science Data Bank.