MMSE-Calibrated Few-Shot Prompting for Alzheimer's Detection

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study investigates prompt-based, fine-tuning-free approaches using large language models (LLMs) for unsupervised Alzheimer’s disease (AD) detection from speech-derived transcripts. To address the challenges of interpretability, clinical alignment, and modality bias in LLM-based AD screening, we propose three key innovations: (1) a novel deterministic calibration method that aligns few-shot prompting probability outputs with the clinical gold-standard Mini-Mental State Examination (MMSE) scores, enhancing both clinical validity and result interpretability; (2) a class-balanced, nested interleaved prompting strategy augmented by multimodal foundation models (e.g., GPT-5) to generate reasoning-enhanced exemplars, thereby improving diagnostic transparency and logical coherence; and (3) a text-only reasoning evaluation framework to eliminate modality-induced biases. Evaluated on the ADReSS dataset, our approach achieves 82% accuracy and 86% AUC—the state-of-the-art performance for prompt-driven AD detection.

Technology Category

Application Category

📝 Abstract

Prompting large language models is a training-free method for detecting Alzheimer's disease from speech transcripts. Using the ADReSS dataset, we revisit zero-shot prompting and study few-shot prompting with a class-balanced protocol using nested interleave and a strict schema, sweeping up to 20 examples per class. We evaluate two variants achieving state-of-the-art prompting results. (i) MMSE-Proxy Prompting: each few-shot example carries a probability anchored to Mini-Mental State Examination bands via a deterministic mapping, enabling AUC computing; this reaches 0.82 accuracy and 0.86 AUC (ii) Reasoning-augmented Prompting: few-shot examples pool is generated with a multimodal LLM (GPT-5) that takes as input the Cookie Theft image, transcript, and MMSE to output a reasoning and MMSE-aligned probability; evaluation remains transcript-only and reaches 0.82 accuracy and 0.83 AUC. To our knowledge, this is the first ADReSS study to anchor elicited probabilities to MMSE and to use multimodal construction to improve interpretability.

Problem

Research questions and friction points this paper is trying to address.

Detecting Alzheimer's disease from speech transcripts using prompting

Calibrating few-shot examples with MMSE scores for probability anchoring

Improving interpretability via multimodal reasoning-augmented prompting

Innovation

Methods, ideas, or system contributions that make the work stand out.

MMSE-anchored probability mapping for AUC computation

Multimodal LLM generates reasoning-enhanced few-shot examples

Class-balanced few-shot prompting with strict schema protocol

🔎 Similar Papers

Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach