A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing cancer prognosis models suffer from poor generalizability, unimodal reliance, and limited cross-cancer transferability. To address these limitations, we propose UMPSNet—the first unified multimodal pan-cancer prognostic prediction framework. UMPSNet jointly encodes whole-slide pathology images, gene expression profiles, and structured clinical text (e.g., demographics, cancer type, treatment, diagnosis). It introduces two key innovations: (1) an optimal transport (OT)-driven cross-modal alignment attention mechanism to enforce semantic consistency across heterogeneous modalities, and (2) a guided soft mixture-of-experts (GMoE) architecture enabling data-distribution-aware fusion and strong single-model generalization. Evaluated on multiple independent pan-cancer cohorts, UMPSNet consistently outperforms state-of-the-art methods in survival prediction across diverse cancer types, demonstrating superior efficacy, robustness, and cross-cancer adaptability.

Technology Category

Application Category

📝 Abstract

Prognostic task is of great importance as it closely related to the survival analysis of patients, the optimization of treatment plans and the allocation of resources. The existing prognostic models have shown promising results on specific datasets, but there are limitations in two aspects. On the one hand, they merely explore certain types of modal data, such as patient histopathology WSI and gene expression analysis. On the other hand, they adopt the per-cancer-per-model paradigm, which means the trained models can only predict the prognostic effect of a single type of cancer, resulting in weak generalization ability. In this paper, a deep-learning based model, named UMPSNet, is proposed. Specifically, to comprehensively understand the condition of patients, in addition to constructing encoders for histopathology images and genomic expression profiles respectively, UMPSNet further integrates four types of important meta data (demographic information, cancer type information, treatment protocols, and diagnosis results) into text templates, and then introduces a text encoder to extract textual features. In addition, the optimal transport OT-based attention mechanism is utilized to align and fuse features of different modalities. Furthermore, a guided soft mixture of experts (GMoE) mechanism is introduced to effectively address the issue of distribution differences among multiple cancer datasets. By incorporating the multi-modality of patient data and joint training, UMPSNet outperforms all SOTA approaches, and moreover, it demonstrates the effectiveness and generalization ability of the proposed learning paradigm of a single model for multiple cancer types. The code of UMPSNet is available at https://github.com/binging512/UMPSNet.

Problem

Research questions and friction points this paper is trying to address.

Cancer Prognosis

Machine Learning

Multi-Cancer Analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

UMPSNet

Multi-modal Information Analysis

Optimal Transport Attention Mechanism

🔎 Similar Papers

No similar papers found.

Authors to Follow