🤖 AI Summary
This study investigates prompt-based, fine-tuning-free approaches using large language models (LLMs) for unsupervised Alzheimer’s disease (AD) detection from speech-derived transcripts. To address the challenges of interpretability, clinical alignment, and modality bias in LLM-based AD screening, we propose three key innovations: (1) a novel deterministic calibration method that aligns few-shot prompting probability outputs with the clinical gold-standard Mini-Mental State Examination (MMSE) scores, enhancing both clinical validity and result interpretability; (2) a class-balanced, nested interleaved prompting strategy augmented by multimodal foundation models (e.g., GPT-5) to generate reasoning-enhanced exemplars, thereby improving diagnostic transparency and logical coherence; and (3) a text-only reasoning evaluation framework to eliminate modality-induced biases. Evaluated on the ADReSS dataset, our approach achieves 82% accuracy and 86% AUC—the state-of-the-art performance for prompt-driven AD detection.
📝 Abstract
Prompting large language models is a training-free method for detecting Alzheimer's disease from speech transcripts. Using the ADReSS dataset, we revisit zero-shot prompting and study few-shot prompting with a class-balanced protocol using nested interleave and a strict schema, sweeping up to 20 examples per class. We evaluate two variants achieving state-of-the-art prompting results. (i) MMSE-Proxy Prompting: each few-shot example carries a probability anchored to Mini-Mental State Examination bands via a deterministic mapping, enabling AUC computing; this reaches 0.82 accuracy and 0.86 AUC (ii) Reasoning-augmented Prompting: few-shot examples pool is generated with a multimodal LLM (GPT-5) that takes as input the Cookie Theft image, transcript, and MMSE to output a reasoning and MMSE-aligned probability; evaluation remains transcript-only and reaches 0.82 accuracy and 0.83 AUC. To our knowledge, this is the first ADReSS study to anchor elicited probabilities to MMSE and to use multimodal construction to improve interpretability.