A 1000-hour EEG-EMG-audio dataset of Japanese speech production

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study addresses a critical gap in research on the neural mechanisms of speech production and speech-based brain–computer interfaces by presenting a large-scale, multimodal dataset comprising over 1,000 hours of synchronized high-density EEG (62–128 channels), facial EMG, and high-fidelity audio recordings from three native Japanese speakers during natural speech. The data were collected across multiple sessions over extended periods using diverse recording setups and include time-aligned transcripts and event annotations. Adhering to the BIDS standard, the dataset has been publicly released via OpenNeuro under a CC0 license. Validation analyses confirm expected neurophysiological signatures, including canonical 1/f spectral profiles, task-related alpha suppression, and time-locked evoked responses. This resource constitutes the first open, multimodal speech dataset at the thousand-hour scale, enabling cross-device, longitudinal, and open-vocabulary speech decoding research.

📝 Abstract

We present a multimodal dataset of 1020 hours of simultaneously recorded scalp electroencephalography (EEG), facial electromyography (EMG), and speech audio from three healthy native Japanese speakers during open-vocabulary overt speech. Recordings were acquired with three EEG systems-an ultra-high-density system (g.Pangolin) and two cap-type systems (g.SCARABEO and eegosports), spanning 62-128 channels-across many sessions over several months. Each session provides time-synchronized EEG, facial EMG, and audio, together with speech-event annotations and transcriptions. Although collected with speech decoding as a primary motivation, the dataset also supports work on multimodal signal processing, artifact modeling, longitudinal and cross-device adaptation, and EEG representation learning. Technical validation included power spectral density and event-related potential analyses across participants, devices, and tasks, which showed the expected 1/f spectral profile, task-related alpha-band attenuation, and time-locked evoked responses. The dataset is released in Brain Imaging Data Structure (BIDS) format via OpenNeuro under a CC0 waiver to support both speech-related and broader EEG research.

Problem

Research questions and friction points this paper is trying to address.

EEG

EMG

speech production

multimodal dataset

Japanese

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal dataset

EEG-EMG-audio synchronization

speech decoding