🤖 AI Summary
Current survival prediction models are limited by unimodal data inputs, resulting in suboptimal prognostic accuracy and compromising clinical decision-making and resource allocation. To address this, we propose a novel multimodal fusion framework that jointly models four heterogeneous data modalities—whole-slide pathology images, gene expression profiles, demographic features, and treatment regimens—for the first time. Methodologically, our approach employs three dedicated encoders, a residual orthogonal decomposition module for cross-modal feature disentanglement, a unified fusion mechanism, and a balanced negative log-likelihood loss that jointly optimizes discriminative performance and patient-level fairness. Evaluated on five TCGA cancer types (BLCA, BRCA, GBMLGG, LUAD, UCEC), our model achieves significant improvements over state-of-the-art methods. The source code is publicly available.
📝 Abstract
Survival prediction is a crucial task in the medical field and is essential for optimizing treatment options and resource allocation. However, current methods often rely on limited data modalities, resulting in suboptimal performance. In this paper, we propose an Integrated Cross-modal Fusion Network (ICFNet) that integrates histopathology whole slide images, genomic expression profiles, patient demographics, and treatment protocols. Specifically, three types of encoders, a residual orthogonal decomposition module and a unification fusion module are employed to merge multi-modal features to enhance prediction accuracy. Additionally, a balanced negative log-likelihood loss function is designed to ensure fair training across different patients. Extensive experiments demonstrate that our ICFNet outperforms state-of-the-art algorithms on five public TCGA datasets, including BLCA, BRCA, GBMLGG, LUAD, and UCEC, and shows its potential to support clinical decision-making and advance precision medicine. The codes are available at: https://github.com/binging512/ICFNet.