🤖 AI Summary
Clinical deployment of AI for Alzheimer’s disease (AD) diagnosis requires both high accuracy and interpretability to ensure clinician trust and patient understanding.
Method: We propose a biologically grounded, multimodal explainable AI framework that integrates single-cell and bulk RNA-seq data via Gaussian process Bayesian unmixing (GP-unmix); incorporates expression quantitative trait locus (eQTL)-informed regulatory priors to guide a neural classifier; and employs a large language model (LLM) as a controllable post-hoc reasoning module to generate personalized, actionable diagnostic reports—avoiding end-to-end black-box prediction.
Contribution/Results: The framework achieves 88.0% diagnostic accuracy on an independent validation cohort. Human expert evaluation confirms the reports’ high factual accuracy, biological plausibility, and adaptability to both clinicians and patients. By jointly embedding domain-specific biological knowledge and human-centered explanation mechanisms, our approach establishes a new paradigm for trustworthy, clinically deployable AI in neurodegenerative disease diagnosis.
📝 Abstract
Building trustworthy clinical AI systems requires not only accurate predictions but also transparent, biologically grounded explanations. We present exttt{DiagnoLLM}, a hybrid framework that integrates Bayesian deconvolution, eQTL-guided deep learning, and LLM-based narrative generation for interpretable disease diagnosis. DiagnoLLM begins with GP-unmix, a Gaussian Process-based hierarchical model that infers cell-type-specific gene expression profiles from bulk and single-cell RNA-seq data while modeling biological uncertainty. These features, combined with regulatory priors from eQTL analysis, power a neural classifier that achieves high predictive performance in Alzheimer's Disease (AD) detection (88.0% accuracy). To support human understanding and trust, we introduce an LLM-based reasoning module that translates model outputs into audience-specific diagnostic reports, grounded in clinical features, attribution signals, and domain knowledge. Human evaluations confirm that these reports are accurate, actionable, and appropriately tailored for both physicians and patients. Our findings show that LLMs, when deployed as post-hoc reasoners rather than end-to-end predictors, can serve as effective communicators within hybrid diagnostic pipelines.