🤖 AI Summary
This study addresses the challenges of scarce labeled data and limited model generalizability in early Alzheimer’s disease (AD) detection by systematically evaluating and fine-tuning large language models—including BERT, T5, and Llama-1B—using a novel multi-loss supervised fine-tuning strategy. The approach is trained and validated across three heterogeneous clinical corpora: Pitt, CCC, and ADRC. Through linear probing and cross-corpus transfer analyses, the work demonstrates that fine-tuning substantially enhances the models’ ability to encode AD-related linguistic signals. Notably, decoder-only architectures such as Llama-1B exhibit competitive or even superior performance compared to encoder-decoder models on this task. The method achieves new state-of-the-art results on both the Pitt and CCC datasets and shows strong performance on ADRC.
📝 Abstract
Reliable early detection of Alzheimer's disease (AD) is challenging, particularly due to limited availability of labeled data. While large language models (LLMs) have shown strong transfer capabilities across domains, adapting them to the AD domain through supervised fine-tuning remains largely unexplored. In this work, we fine-tune an LLM for AD detection and investigate how task-relevant information is encoded within its internal representations. We employ probing techniques to analyze intermediate activations across transformer layers, and we observe that, after fine-tuning, the probing values of specific words and special markers change substantially, indicating that these elements assume a crucial role in the model's improved detection performance. Guided by this insight, we design a curated set of task-aware special markers and train a sequence-to-sequence model as a data-synthesis tool that leverages these markers to generate structurally consistent and diagnostically informative synthetic samples. We evaluate the synthesized data both intrinsically and by incorporating it into downstream training pipelines.