🤖 AI Summary
To address the clinical impracticality of quantitative myelin imaging (qMT) due to its long acquisition time (20–30 minutes), this work proposes a novel method for synthesizing quantitative myelin biomarkers—specifically, pool size ratio (PSR) maps—from routine T1-weighted (T1w) and FLAIR MRI scans. We introduce a 3D latent diffusion model augmented with a decoupled multi-stream conditional mechanism: integrating semantic cross-attention, scale-wise 3D ControlNet residual guidance, and LoRA-modulated attention—enabling substantial improvements in boundary sharpness and quantitative consistency with minimal parameter tuning. A two-stage training strategy is employed: first aligning latent spaces, then freezing the diffusion backbone while jointly optimizing edge-aware and alignment losses. Evaluated on 163 clinically acquired scans via five-fold cross-validation, our method consistently outperforms VAE-, GAN-, and standard diffusion-based baselines, yielding PSR maps with superior anatomical fidelity and quantitative accuracy. Code is publicly available.
📝 Abstract
Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, qMT requires specialized 20-30 minute scans. We propose DEMIST to synthesize PSR maps from standard T1w and FLAIR images using a 3D latent diffusion model with three complementary conditioning mechanisms. Our approach has two stages: first, we train separate autoencoders for PSR and anatomical images to learn aligned latent representations. Second, we train a conditional diffusion model in this latent space on top of a frozen diffusion foundation backbone. Conditioning is decoupled into: (i) extbf{semantic} tokens via cross-attention, (ii) extbf{spatial} per-scale residual hints via a 3D ControlNet branch, and (iii) extbf{adaptive} LoRA-modulated attention. We include edge-aware loss terms to preserve lesion boundaries and alignment losses to maintain quantitative consistency, while keeping the number of trainable parameters low and retaining the inductive bias of the pretrained model. We evaluate on 163 scans from 99 subjects using 5-fold cross-validation. Our method outperforms VAE, GAN and diffusion baselines on multiple metrics, producing sharper boundaries and better quantitative agreement with ground truth. Our code is publicly available at https://github.com/MedICL-VU/MS-Synthesis-3DcLDM.