Multimodal Survival Modeling in the Age of Foundation Models

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses cancer survival prediction by proposing the first multimodal survival modeling framework integrating unstructured pathology report text. To handle heterogeneous data—including genomic profiles, histopathological images, and free-text pathology reports—from The Cancer Genome Atlas (TCGA), we leverage foundation models (e.g., BioMedLM) for zero-shot feature extraction and systematically incorporate pathology text embeddings into the Cox proportional hazards model—a novel application in survival analysis. Methodologically, we introduce cross-modal feature disentanglement, plug-and-play fusion strategies (concatenation and attention-based), and a quantitative evaluation framework for clinical text summarization fidelity and hallucination. Experiments demonstrate that multimodal integration significantly outperforms unimodal baselines; incorporating pathology text embeddings improves the concordance index (C-index) by 3.2%, validating the expressive power, robustness, and clinical interpretability of foundation-model-derived features in survival modeling.

Technology Category

Application Category

📝 Abstract
The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference through its harmonized genomics, clinical, and image data. Prior studies have trained bespoke cancer survival prediction models from unimodal or multimodal TCGA data. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive meaningful feature embeddings, agnostic to a specific modeling task. Biomedical text especially has seen growing development of FMs. While TCGA contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the feasibility of training classical, multimodal survival models over zero-shot embeddings extracted by FMs. We show the ease and additive effect of multimodal fusion, outperforming unimodal models. We demonstrate the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we modernize survival modeling by leveraging FMs and information extraction from pathology reports.
Problem

Research questions and friction points this paper is trying to address.

Leveraging foundation models for multimodal cancer survival prediction
Utilizing underused pathology reports to enhance survival modeling
Evaluating text summarization and hallucination in model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging foundation models for survival prediction
Multimodal fusion enhances cancer survival models
Utilizing pathology reports via text embeddings
🔎 Similar Papers
No similar papers found.
Steven Song
Steven Song
University of Chicago
machine learning for healthcare
M
Morgan Borjigin-Wang
Center for Translational Data Science, University of Chicago, Chicago IL; Google, Chicago IL
I
Irene Madejski
Center for Translational Data Science, University of Chicago, Chicago IL; Department of Computer Science, University of Chicago, Chicago IL
R
Robert L. Grossman
Center for Translational Data Science, University of Chicago, Chicago IL; Department of Computer Science, University of Chicago, Chicago IL; Section of Biomedical Data Science, Department of Medicine, University of Chicago, Chicago IL