🤖 AI Summary
Conventional multiple instance learning (MIL) suffers significant performance degradation when instance counts in cerebral hematoma CT images surge to 256, due to heightened false correlations and sensitivity to instance scale. Method: This work pioneers the integration of self-supervised vision pretraining models—specifically DINO and MAE—as upstream pretraining paradigms for high-density medical MIL. We propose a feature-level attention-based aggregation mechanism coupled with downstream fine-tuning to mitigate spurious correlations and overcome MIL’s scalability bottleneck. Contribution/Results: On low-density sign classification, our approach improves accuracy by 5–13 percentage points and F1-score by 40–55 percentage points. It markedly enhances robustness in small-target detection and highly redundant imaging scenarios. This study establishes a scalable, self-supervised modeling framework for large-scale medical MIL, advancing practical deployment in dense-instance clinical imaging tasks.
📝 Abstract
In deep multi-instance learning, the number of applicable instances depends on the data set. In histopathology images, deep learning multi-instance learners usually assume there are hundreds to thousands instances in a bag. However, when the number of instances in a bag increases to 256 in brain hematoma CT, learning becomes extremely difficult. In this paper, we address this drawback. To overcome this problem, we propose using a pre-trained model with self-supervised learning for the multi-instance learner as a downstream task. With this method, even when the original target task suffers from the spurious correlation problem, we show improvements of 5% to 13% in accuracy and 40% to 55% in the F1 measure for the hypodensity marker classification of brain hematoma CT.