🤖 AI Summary
Current IVF pregnancy prediction models struggle to effectively integrate sequential embryo images with parental fertility tabular data, leading to suboptimal accuracy. To address this, we propose a multimodal disentangled fusion framework: (1) a disentanglement module explicitly separates modality-shared and modality-specific features; (2) spatiotemporal positional encoding captures embryonic developmental dynamics; (3) a Tabular Transformer extracts structured representations from clinical fertility metrics; and (4) cross-modal alignment enables deep feature integration. Evaluated on a new, large-scale dataset of 4,046 IVF cases from Southern Medical University, our method significantly outperforms state-of-the-art approaches. Furthermore, when transferred to an ophthalmic disease prediction task—without architectural modification—it maintains strong generalization performance, demonstrating the framework’s universality and robustness across medical domains.
📝 Abstract
Temporal embryo images and parental fertility table indicators are both valuable for pregnancy prediction in extbf{in vitro fertilization embryo transfer} (IVF-ET). However, current machine learning models cannot make full use of the complementary information between the two modalities to improve pregnancy prediction performance. In this paper, we propose a Decoupling Fusion Network called DeFusion to effectively integrate the multi-modal information for IVF-ET pregnancy prediction. Specifically, we propose a decoupling fusion module that decouples the information from the different modalities into related and unrelated information, thereby achieving a more delicate fusion. And we fuse temporal embryo images with a spatial-temporal position encoding, and extract fertility table indicator information with a table transformer. To evaluate the effectiveness of our model, we use a new dataset including 4046 cases collected from Southern Medical University. The experiments show that our model outperforms state-of-the-art methods. Meanwhile, the performance on the eye disease prediction dataset reflects the model's good generalization. Our code and dataset are available at https://github.com/Ou-Young-1999/DFNet.