What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection

📅 2026-01-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the challenges of scarce labeled data and limited model generalizability in early Alzheimer’s disease (AD) detection by systematically evaluating and fine-tuning large language models—including BERT, T5, and Llama-1B—using a novel multi-loss supervised fine-tuning strategy. The approach is trained and validated across three heterogeneous clinical corpora: Pitt, CCC, and ADRC. Through linear probing and cross-corpus transfer analyses, the work demonstrates that fine-tuning substantially enhances the models’ ability to encode AD-related linguistic signals. Notably, decoder-only architectures such as Llama-1B exhibit competitive or even superior performance compared to encoder-decoder models on this task. The method achieves new state-of-the-art results on both the Pitt and CCC datasets and shows strong performance on ADRC.

📝 Abstract

Reliable early detection of Alzheimer's disease (AD) is challenging, particularly due to limited availability of labeled data. While large language models (LLMs) have shown strong transfer capabilities across domains, adapting them to the AD domain through supervised fine-tuning remains largely unexplored. In this work, we fine-tune an LLM for AD detection and investigate how task-relevant information is encoded within its internal representations. We employ probing techniques to analyze intermediate activations across transformer layers, and we observe that, after fine-tuning, the probing values of specific words and special markers change substantially, indicating that these elements assume a crucial role in the model's improved detection performance. Guided by this insight, we design a curated set of task-aware special markers and train a sequence-to-sequence model as a data-synthesis tool that leverages these markers to generate structurally consistent and diagnostically informative synthetic samples. We evaluate the synthesized data both intrinsically and by incorporating it into downstream training pipelines.

Problem

Research questions and friction points this paper is trying to address.

Alzheimer's disease

large language models

early detection

text-based diagnosis

limited labeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-loss fine-tuning

linear probing

Alzheimer's disease detection