National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Alzheimer’s disease and related dementias (ADRD) are highly prevalent among older adults and frequently underdiagnosed, necessitating non-invasive, scalable early screening tools. To address this, we propose a multimodal speech analysis framework that jointly models acoustic, linguistic, and demographic features. Our method employs a Mixture-of-Experts (MoE)-inspired dynamic cross-modal fusion architecture, integrating pretrained multilingual speech and language models, automatic speech recognition (ASR), large language model (LLM)-driven anomaly detection, and a SHAP-based interpretability module. A weighted loss function optimizes multi-task learning for robust classification. Evaluated on a three-class task—healthy controls, mild cognitive impairment (MCI), and Alzheimer’s disease—the framework achieves AUC = 0.88 and F1 = 0.72; for MCI-specific detection, it attains AUC = 0.90 and F1 = 0.62. Notably, it significantly reduces bias against non-elderly populations. This work advances speech-based ADRD screening by improving accuracy, robustness, and clinical interpretability.

Technology Category

Application Category

📝 Abstract

Alzheimer's disease and related dementias (ADRD) affect one in five adults over 60, yet more than half of individuals with cognitive decline remain undiagnosed. Speech-based assessments show promise for early detection, as phonetic motor planning deficits alter acoustic features (e.g., pitch, tone), while memory and language impairments lead to syntactic and semantic errors. However, conventional speech-processing pipelines with hand-crafted features or general-purpose audio classifiers often exhibit limited performance and generalizability. To address these limitations, we introduce SpeechCARE, a multimodal speech processing pipeline that leverages pretrained, multilingual acoustic and linguistic transformer models to capture subtle speech-related cues associated with cognitive impairment. Inspired by the Mixture of Experts (MoE) paradigm, SpeechCARE employs a dynamic fusion architecture that weights transformer-based acoustic, linguistic, and demographic inputs, allowing integration of additional modalities (e.g., social factors, imaging) and enhancing robustness across diverse tasks. Its robust preprocessing includes automatic transcription, large language model (LLM)-based anomaly detection, and task identification. A SHAP-based explainability module and LLM reasoning highlight each modality's contribution to decision-making. SpeechCARE achieved AUC = 0.88 and F1 = 0.72 for classifying cognitively healthy, MCI, and AD individuals, with AUC = 0.90 and F1 = 0.62 for MCI detection. Bias analysis showed minimal disparities, except for adults over 80. Mitigation techniques included oversampling and weighted loss. Future work includes deployment in real-world care settings (e.g., VNS Health, Columbia ADRC) and EHR-integrated explainability for underrepresented populations in New York City.

Problem

Research questions and friction points this paper is trying to address.

Detecting cognitive impairment early using speech analysis

Overcoming limitations of conventional speech-processing pipelines

Integrating multimodal inputs for robust dementia assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pretrained multilingual acoustic and linguistic transformers

Employs dynamic fusion architecture weighting multimodal inputs

Includes robust preprocessing with LLM-based anomaly detection

🔎 Similar Papers

A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities