🤖 AI Summary
This study addresses the dual challenge of limited accuracy and poor clinical interpretability in postmenstrual age (PMA) estimation from neonatal brain MRI. We propose an interpretable regression framework powered by a multimodal large language model (MLLM). Methodologically, we adapt Qwen2.5-VL-7B via instruction tuning and Low-Rank Adaptation (LoRA) to jointly optimize PMA regression and natural-language explanation generation—using 2D cortical projection maps derived from MRI as input. To our knowledge, this is the first application of parameter-efficient fine-tuning to neonatal neurodevelopmental quantification. The framework achieves high prediction accuracy (95% CI: 0.78–1.52 weeks) while autonomously generating clinically meaningful textual explanations grounded in established neurodevelopmental milestones. By bridging quantitative modeling with domain-specific interpretability, our approach significantly enhances transparency and clinical trustworthiness of AI in perinatal neuroscience.
📝 Abstract
Accurate estimation of postmenstrual age (PMA) at scan is crucial for assessing neonatal development and health. While deep learning models have achieved high accuracy in predicting PMA from brain MRI, they often function as black boxes, offering limited transparency and interpretability in clinical decision support. In this work, we address the dual challenge of accuracy and interpretability by adapting a multimodal large language model (MLLM) to perform both precise PMA prediction and clinically relevant explanation generation. We introduce a parameter-efficient fine-tuning (PEFT) strategy using instruction tuning and Low-Rank Adaptation (LoRA) applied to the Qwen2.5-VL-7B model. The model is trained on four 2D cortical surface projection maps derived from neonatal MRI scans. By employing distinct prompts for training and inference, our approach enables the MLLM to handle a regression task during training and generate clinically relevant explanations during inference. The fine-tuned model achieves a low prediction error with a 95 percent confidence interval of 0.78 to 1.52 weeks, while producing interpretable outputs grounded in developmental features, marking a significant step toward transparent and trustworthy AI systems in perinatal neuroscience.