🤖 AI Summary
This study addresses the challenge of accurately classifying pediatric central nervous system tumors, which is hindered by histological complexity and data scarcity, limiting the effectiveness of existing foundation models in integrating H&E whole-slide images, clinical text, and cellular microstructure. To this end, we propose the first interpretable multimodal mixture-of-experts framework tailored for this task, leveraging state-of-the-art foundation models. Our approach employs an input-adaptive gating mechanism to dynamically model the uniqueness, redundancy, and synergy among modalities, yielding sample-level interpretability. Evaluated on an internal pediatric brain tumor (PBT) dataset, our method achieves a macro F1-score of 0.799 (+0.037), and 0.709 (+0.041) on TCGA when augmented with cellular graphs—significantly outperforming single-modality state-of-the-art methods. The model further uncovers critical modality interactions, offering transparent decision support for rare tumor subtypes.
📝 Abstract
Accurate classification of pediatric central nervous system tumors remains challenging due to histological complexity and limited training data. While pathology foundation models have advanced whole-slide image (WSI) analysis, they often fail to leverage the rich, complementary information found in clinical text and tissue microarchitecture. To this end, we propose PathMoE, an interpretable multimodal framework that integrates H\&E slides, pathology reports, and nuclei-level cell graphs via an interaction-aware mixture-of-experts architecture built on state-of-the-art foundation models for each modality. By training specialized experts to capture modality uniqueness, redundancy, and synergy, PathMoE employs an input-dependent gating mechanism that dynamically weights these interactions, providing sample-level interpretability. We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset (PBT) and external TCGA datasets. PathMoE improves macro-F1 from 0.762 to 0.799 (+0.037) on PBT when integrating WSI, text, and graph modalities; on TCGA, augmenting WSI with graph knowledge improves macro-F1 from 0.668 to 0.709 (+0.041). These results demonstrate significant performance gains over state-of-the-art image-only baselines while revealing the specific modality interactions driving individual predictions. This interpretability is particularly critical for rare tumor subtypes, where transparent model reasoning is essential for clinical trust and diagnostic validation.