MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications

📅 2024-10-20
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical diffusion models are typically constrained to single anatomical regions, tasks, or datasets, limiting their generalizability and clinical utility. To address this, we propose MedDiff-FM—the first 3D medical diffusion foundation model designed for multi-anatomical (head-to-abdomen) and multi-task learning. Our method introduces: (1) a novel general-purpose medical diffusion foundation architecture; (2) anatomy-aware positional embeddings and structure-guided modeling; and (3) joint image- and patch-level representation learning, enabling unified support—via a single pretrained model plus lightweight ControlNet fine-tuning—for six key tasks: denoising, anomaly detection, synthesis, super-resolution, lesion generation, and inpainting. Pretrained on multi-center CT data across anatomical domains, MedDiff-FM achieves significant improvements on multiple public benchmarks: +2.1 dB PSNR, +5.3% AUROC, and enhanced synthesis fidelity. It further demonstrates strong few-shot adaptability to downstream tasks.

Technology Category

Application Category

📝 Abstract
Diffusion models have achieved significant success in both natural image and medical image domains, encompassing a wide range of applications. Previous investigations in medical images have often been constrained to specific anatomical regions, particular applications, and limited datasets, resulting in isolated diffusion models. This paper introduces a diffusion-based foundation model to address a diverse range of medical image tasks, namely MedDiff-FM. MedDiff-FM leverages 3D CT images from multiple publicly available datasets, covering anatomical regions from head to abdomen, to pre-train a diffusion foundation model, and explores the capabilities of the diffusion foundation model across a variety of application scenarios. The diffusion foundation model handles multi-level integrated image processing both at the image-level and patch-level, utilizes position embedding to establish multi-level spatial relationships, and leverages region classes and anatomical structures to capture certain anatomical regions. MedDiff-FM manages several downstream tasks seamlessly, including image denoising, anomaly detection, and image synthesis. MedDiff-FM is also capable of performing super-resolution, lesion generation, and lesion inpainting by rapidly fine-tuning the diffusion foundation model using ControlNet with task-specific conditions. The experimental results demonstrate the effectiveness of MedDiff-FM in addressing diverse downstream medical image tasks.
Problem

Research questions and friction points this paper is trying to address.

Develops a diffusion foundation model for diverse medical image tasks
Handles multi-level image processing and spatial relationships
Performs downstream tasks like denoising, anomaly detection, and synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages 3D CT images for diffusion foundation model
Handles multi-level integrated image processing tasks
Uses ControlNet for rapid fine-tuning with conditions
🔎 Similar Papers
No similar papers found.