🤖 AI Summary
This work addresses the challenge of jointly handling missing modalities and modeling longitudinal dynamics in clinical multimodal learning by proposing LongMoE, a unified framework that seamlessly integrates modality imputation with disease trajectory modeling. LongMoE uniquely combines context-aware imputation, frequency-domain attention-based tokenization, a trajectory-aware Transformer encoder, and a conditional sparse Mixture-of-Experts routing mechanism to enable joint modeling and personalized inference on incomplete, irregularly sampled multimodal time-series data. Experimental results on the ADNI, OASIS-3, and MIMIC-IV datasets demonstrate that LongMoE achieves strong robustness under missing-modality conditions while remaining competitive even in full-modality settings.
📝 Abstract
Multimodal clinical learning is increasingly important for integrating diverse patient data, including imaging, text, and personalised health records. However, it faces two fundamental challenges: i) modality missingness, where arbitrary subsets of modalities are unavailable at a given patient visit, ii) longitudinal dynamics, where the diagnostic significance of an observation depends on the patient's evolving disease trajectory over time. Existing methods address these challenges in isolation: missing-modality frameworks treat each visit as an independent static snapshot and discard temporal context, while longitudinal models often assume complete modality availability and degrade under systematic modality incompleteness. We propose LongMoE (Longitudinal Mixture-of-Experts), the unified framework to jointly address both challenges. LongMoE combines a context-aware imputation module with an attentional tokenization module that captures frequency-domain temporal patterns across irregular visit sequences, a trajectory-aware encoder for modeling disease progression, and context-conditioned Sparse MoE routing for patient-specific expert selection. Experiments on ADNI, OASIS-3, and MIMIC-IV show that LongMoE improves robustness under missing or weak contemporaneous modalities and remains competitive in full-modality settings, establishing a strong foundation for longitudinally-aware multimodal clinical learning.