NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

📅 2024-03-04

🏛️ Scientific Data

📈 Citations: 3

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Current Parkinson’s disease (PD) voice screening research is severely hindered by the scarcity of publicly available non-English speech datasets—particularly Castilian Spanish—impeding methodological reproducibility and cross-linguistic validation. To address this gap, we present the first open-access Castilian Spanish speech corpus specifically designed for PD assessment, comprising 12 hours of high-fidelity recordings from 127 participants (62 PD patients, 65 healthy controls). Recordings were acquired using professional microphone arrays, aligned with UPDRS clinical scores, processed via Praat-based acoustic analysis, and annotated per ISO/IEC 23053 metadata standards. Crucially, we introduce a clinically interpretable acoustic–motor co-annotation framework to systematically characterize multidimensional vocal impairments. The dataset is publicly released and has already enabled three independent validation studies on PD voice biomarkers, thereby filling a critical void in non-English neurospeech resources.