Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Public voice-based biomarker datasets for mental health and neurodegenerative disorders exhibit heterogeneous FAIR (Findable, Accessible, Interoperable, Reusable) compliance and limited clinical translatability. To address this, we systematically evaluated 27 publicly available datasets and—first in the field—proposed a priority-weighted FAIR maturity scoring framework specifically tailored to voice biomarkers, enabling multidimensional quantification at the sub-principle, principle, and holistic levels. Results reveal widespread deficiencies in Accessibility, Interoperability, and Reusability; repository compliance and adoption of domain-specific metadata standards emerged as critical determinants of FAIR performance. This framework provides an actionable, evidence-based roadmap for enhancing data quality and accelerating clinical translation. By establishing standardized, transparent, and reproducible evaluation criteria, it advances the rigor, interoperability, and trustworthiness of voice biomarker research.

Technology Category

Application Category

📝 Abstract
Voice biomarkers--human-generated acoustic signals such as speech, coughing, and breathing--are promising tools for scalable, non-invasive detection and monitoring of mental health and neurodegenerative diseases. Yet, their clinical adoption remains constrained by inconsistent quality and limited usability of publicly available datasets. To address this gap, we present the first systematic FAIR (Findable, Accessible, Interoperable, Reusable) evaluation of 27 publicly available voice biomarker datasets focused on these disease areas. Using the FAIR Data Maturity Model and a structured, priority-weighted scoring method, we assessed FAIRness at subprinciple, principle, and composite levels. Our analysis revealed consistently high Findability but substantial variability and weaknesses in Accessibility, Interoperability, and Reusability. Mental health datasets exhibited greater variability in FAIR scores, while neurodegenerative datasets were slightly more consistent. Repository choice also significantly influenced FAIRness scores. To enhance dataset quality and clinical utility, we recommend adopting structured, domain-specific metadata standards, prioritizing FAIR-compliant repositories, and routinely applying structured FAIR evaluation frameworks. These findings provide actionable guidance to improve dataset interoperability and reuse, thereby accelerating the clinical translation of voice biomarker technologies.
Problem

Research questions and friction points this paper is trying to address.

Assess FAIRness of voice biomarker datasets for mental health
Evaluate dataset quality and usability constraints in clinical adoption
Identify variability in Accessibility, Interoperability, and Reusability principles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic FAIR evaluation framework
Priority-weighted scoring methodology
Domain-specific metadata standards recommendation
🔎 Similar Papers
No similar papers found.
I
Ishaan Mahapatra
Haslett High School, Haslett, MI, USA
Nihar R. Mahapatra
Nihar R. Mahapatra
Michigan State University