🤖 AI Summary
This work addresses the challenge of model generalization in arterial spin labeling (ASL) cerebral blood flow (CBF) imaging, which is hindered by multicenter protocol heterogeneity, variable image quality, and scarce annotations. To overcome these limitations, we propose a self-supervised pretraining approach based on a three-dimensional masked autoencoder, uniquely integrating 3D masked image modeling with a Vision Transformer architecture. Leveraging the largest ASL CBF dataset to date—comprising 11,405 subjects—our method learns highly transferable representations. It significantly outperforms existing self-supervised neuroimaging approaches across three diagnostic classification tasks and one image quality control task, demonstrating markedly improved cross-center generalization. The pretrained weights and source code will be publicly released to facilitate further research.
📝 Abstract
Arterial spin labeling (ASL) perfusion MRI allows direct quantification of regional cerebral blood flow (CBF) without exogenous contrast, enabling noninvasive measurements that can be repeated without constraints imposed by contrast injection. ASL is increasingly acquired in research studies and clinical MRI protocols. Building on successes in structural imaging, recent efforts have implemented deep learning based methods to improve image quality, enable automated quality control, and derive robust quantitative and predictive biomarkers with ASL derived CBF. However, progress has been limited by variable image quality, substantial inter-site, vendor and protocol differences, and limited availability of labeled datasets needed to train models that generalize across cohorts. To address these challenges, we introduce ICHOR, a self supervised pre-training approach for ASL CBF maps that learns transferable representations using 3D masked autoencoders. ICHOR is pretrained via masked image modeling using a Vision Transformer backbone and can be used as a general-purpose encoder for downstream ASL tasks. For pre-training, we curated one of the largest ASL datasets to date, comprising 11,405 ASL CBF scans from 14 studies spanning multiple sites and acquisition protocols. We evaluated the pre-trained ICHOR encoder on three downstream diagnostic classification tasks and one ASL CBF map quality prediction regression task. Across all evaluations, ICHOR outperformed existing neuroimaging self-supervised pre-training methods adapted to ASL. Pre-trained weights and code will be made publicly available.