A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis

πŸ“… 2025-01-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing cancer prognosis models suffer from poor generalizability, unimodal reliance, and limited cross-cancer transferability. To address these limitations, we propose UMPSNetβ€”the first unified multimodal pan-cancer prognostic prediction framework. UMPSNet jointly encodes whole-slide pathology images, gene expression profiles, and structured clinical text (e.g., demographics, cancer type, treatment, diagnosis). It introduces two key innovations: (1) an optimal transport (OT)-driven cross-modal alignment attention mechanism to enforce semantic consistency across heterogeneous modalities, and (2) a guided soft mixture-of-experts (GMoE) architecture enabling data-distribution-aware fusion and strong single-model generalization. Evaluated on multiple independent pan-cancer cohorts, UMPSNet consistently outperforms state-of-the-art methods in survival prediction across diverse cancer types, demonstrating superior efficacy, robustness, and cross-cancer adaptability.

Technology Category

Application Category

πŸ“ Abstract
Prognostic task is of great importance as it closely related to the survival analysis of patients, the optimization of treatment plans and the allocation of resources. The existing prognostic models have shown promising results on specific datasets, but there are limitations in two aspects. On the one hand, they merely explore certain types of modal data, such as patient histopathology WSI and gene expression analysis. On the other hand, they adopt the per-cancer-per-model paradigm, which means the trained models can only predict the prognostic effect of a single type of cancer, resulting in weak generalization ability. In this paper, a deep-learning based model, named UMPSNet, is proposed. Specifically, to comprehensively understand the condition of patients, in addition to constructing encoders for histopathology images and genomic expression profiles respectively, UMPSNet further integrates four types of important meta data (demographic information, cancer type information, treatment protocols, and diagnosis results) into text templates, and then introduces a text encoder to extract textual features. In addition, the optimal transport OT-based attention mechanism is utilized to align and fuse features of different modalities. Furthermore, a guided soft mixture of experts (GMoE) mechanism is introduced to effectively address the issue of distribution differences among multiple cancer datasets. By incorporating the multi-modality of patient data and joint training, UMPSNet outperforms all SOTA approaches, and moreover, it demonstrates the effectiveness and generalization ability of the proposed learning paradigm of a single model for multiple cancer types. The code of UMPSNet is available at https://github.com/binging512/UMPSNet.
Problem

Research questions and friction points this paper is trying to address.

Cancer Prognosis
Machine Learning
Multi-Cancer Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

UMPSNet
Multi-modal Information Analysis
Optimal Transport Attention Mechanism
πŸ”Ž Similar Papers
No similar papers found.
B
Binyu Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
S
Shichao Li
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Zhu Meng
Zhu Meng
Beijing University of Posts and Telecommunications
Medical Image Processing
L
Limei Guo
Department of Pathology, School of Basic Medical Sciences, Third Hospital, Peking University, Beijing, China
Zhicheng Zhao
Zhicheng Zhao
Associate Professor at the School of Artificial Intelligence, Anhui University
Computer Vision