🤖 AI Summary
In clinical multimodal diagnosis, class imbalance severely impairs model performance on minority diseases; conventional resampling or loss-weighting approaches often induce overfitting or underfitting and neglect cross-modal interactions. To address this, we propose CLIMD, a curriculum learning framework featuring a novel multimodal curriculum assessment mechanism that jointly incorporates intra-modal confidence and inter-modal complementarity—enabling class-distribution-aware, progressive hard-sample learning. Additionally, CLIMD introduces a plug-and-play training scheduler guided by cross-modal complementarity, explicitly modeling both intra-modal discriminability and inter-modal synergy. Evaluated across multiple multimodal medical benchmarks, CLIMD consistently improves minority-class diagnostic accuracy, achieving average F1-score gains of 3.2–7.8%. The framework demonstrates strong generalizability and seamless integration flexibility with existing architectures.
📝 Abstract
Clinicians usually combine information from multiple sources to achieve the most accurate diagnosis, and this has sparked increasing interest in leveraging multimodal deep learning for diagnosis. However, in real clinical scenarios, due to differences in incidence rates, multimodal medical data commonly face the issue of class imbalance, which makes it difficult to adequately learn the features of minority classes. Most existing methods tackle this issue with resampling or loss reweighting, but they are prone to overfitting or underfitting and fail to capture cross-modal interactions. Therefore, we propose a Curriculum Learning framework for Imbalanced Multimodal Diagnosis (CLIMD). Specifically, we first design multimodal curriculum measurer that combines two indicators, intra-modal confidence and inter-modal complementarity, to enable the model to focus on key samples and gradually adapt to complex category distributions. Additionally, a class distribution-guided training scheduler is introduced, which enables the model to progressively adapt to the imbalanced class distribution during training. Extensive experiments on multiple multimodal medical datasets demonstrate that the proposed method outperforms state-of-the-art approaches across various metrics and excels in handling imbalanced multimodal medical data. Furthermore, as a plug-and-play CL framework, CLIMD can be easily integrated into other models, offering a promising path for improving multimodal disease diagnosis accuracy. Code is publicly available at https://github.com/KHan-UJS/CLIMD.