🤖 AI Summary
Public voice-based biomarker datasets for mental health and neurodegenerative disorders exhibit heterogeneous FAIR (Findable, Accessible, Interoperable, Reusable) compliance and limited clinical translatability. To address this, we systematically evaluated 27 publicly available datasets and—first in the field—proposed a priority-weighted FAIR maturity scoring framework specifically tailored to voice biomarkers, enabling multidimensional quantification at the sub-principle, principle, and holistic levels. Results reveal widespread deficiencies in Accessibility, Interoperability, and Reusability; repository compliance and adoption of domain-specific metadata standards emerged as critical determinants of FAIR performance. This framework provides an actionable, evidence-based roadmap for enhancing data quality and accelerating clinical translation. By establishing standardized, transparent, and reproducible evaluation criteria, it advances the rigor, interoperability, and trustworthiness of voice biomarker research.
📝 Abstract
Voice biomarkers--human-generated acoustic signals such as speech, coughing, and breathing--are promising tools for scalable, non-invasive detection and monitoring of mental health and neurodegenerative diseases. Yet, their clinical adoption remains constrained by inconsistent quality and limited usability of publicly available datasets. To address this gap, we present the first systematic FAIR (Findable, Accessible, Interoperable, Reusable) evaluation of 27 publicly available voice biomarker datasets focused on these disease areas. Using the FAIR Data Maturity Model and a structured, priority-weighted scoring method, we assessed FAIRness at subprinciple, principle, and composite levels. Our analysis revealed consistently high Findability but substantial variability and weaknesses in Accessibility, Interoperability, and Reusability. Mental health datasets exhibited greater variability in FAIR scores, while neurodegenerative datasets were slightly more consistent. Repository choice also significantly influenced FAIRness scores. To enhance dataset quality and clinical utility, we recommend adopting structured, domain-specific metadata standards, prioritizing FAIR-compliant repositories, and routinely applying structured FAIR evaluation frameworks. These findings provide actionable guidance to improve dataset interoperability and reuse, thereby accelerating the clinical translation of voice biomarker technologies.