🤖 AI Summary
This work addresses the lack of reliability assessment for multi-instance learning (MIL) models in computational pathology, specifically for whole-slide image (WSI) classification. We propose, for the first time, a systematic framework to quantify the consistency between model predictions and domain-specific pathological prior knowledge. To this end, we design three novel reliability metrics and conduct empirical evaluation across three region-level annotated WSI datasets. Contrary to prevailing assumptions, our analysis reveals that the structurally simple MEAN-POOL-INS model significantly outperforms state-of-the-art complex MIL architectures in reliability—while simultaneously achieving low computational overhead and high interpretability. These properties position it as a promising paradigm for clinically trustworthy AI. All code and the comprehensive evaluation framework are publicly released.
📝 Abstract
Machine learning models have become integral to many fields, but their reliability, particularly in high-stakes domains, remains a critical concern. Reliability refers to the quality of being dependable and trustworthy. Reliable models consistently provide predictions aligned with basic domain knowledge, making their development and deployment particularly critical in healthcare applications. However, Multiple Instance Learning (MIL) models designed for Whole Slide Image (WSI) classification in computational pathology are rarely evaluated in terms of reliability. In this paper, we address this gap by comparing the reliability of MIL models using three proposed metrics, applied across three region-wise annotated datasets. Our findings indicate that the mean pooling instance (MEAN-POOL-INS) model demonstrates superior reliability compared to other networks, despite its simple architectural design and computational efficiency. The code for reproducing our results is available at github.com/tueimage/MIL-Reliability. Keywords: Machine learning, Reliability, Whole Slide Image, Multiple Instance Learning, MEAN-POOL-INS.