๐ค AI Summary
This study addresses the insufficient accuracy of synthetic CT (sCT) generation in MRI/CBCT-guided radiation therapy and adaptive radiotherapy. We propose the first 3D flow-matching framework tailored for multimodal medical image synthesis, which end-to-end maps Gaussian-noised voxels to conditional sCT via a learned velocity fieldโunifying MRI-to-sCT and CBCT-to-sCT translation. A lightweight 3D encoder is introduced to extract input features, significantly improving generation efficiency and global anatomical fidelity. Evaluated on the SynthRAD2025 challenge, the model achieves superior large-scale structural reconstruction in abdominal, head-and-neck, and thoracic regions; however, submillimeter anatomical details remain constrained by output resolution. The core contribution lies in the first extension of the flow-matching paradigm to cross-modal 3D medical image synthesis, establishing a novel framework for low-radiation, high-precision adaptive radiotherapy.
๐ Abstract
Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI $
ightarrow$ sCT and CBCT $
ightarrow$ sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.