🤖 AI Summary
The “black-box” nature of deep learning models hinders their clinical adoption for Parkinson’s disease (PD) speech detection.
Method: We propose a mask-driven sparse autoencoder activation mechanism to construct a disentangled, interpretable speech representation dictionary. Our framework integrates deep speech models with sparse autoencoders, incorporates attention-based analysis and multi-scale spectral feature extraction, and fuses structural MRI data (e.g., putaminal volume).
Contribution/Results: We identify neurobiologically meaningful, interpretable biomarkers—including spectral flux and flatness in low-energy frequency bands. Critically, we first demonstrate a significant negative correlation between spectral flux and putaminal atrophy (p < 0.01). This approach substantially enhances model transparency and clinical interpretability, establishing a verifiable, traceable biomarker pathway for AI-enabled noninvasive early screening of PD.
📝 Abstract
Speech holds promise as a cost-effective and non-invasive biomarker for neurological conditions such as Parkinson's disease (PD). While deep learning systems trained on raw audio can find subtle signals not available from hand-crafted features, their black-box nature hinders clinical adoption. To address this, we apply sparse autoencoders (SAEs) to uncover interpretable internal representations from a speech-based PD detection system. We introduce a novel mask-based activation for adapting SAEs to small biomedical datasets, creating sparse disentangled dictionary representations. These dictionary entries are found to have strong associations with characteristic articulatory deficits in PD speech, such as reduced spectral flux and increased spectral flatness in the low-energy regions highlighted by the model attention. We further show that the spectral flux is related to volumetric measurements of the putamen from MRI scans, demonstrating the potential of SAEs to reveal clinically relevant biomarkers for disease monitoring and diagnosis.