Mask of truth: model sensitivity to unexpected regions of medical images

📅 2024-12-05
🏛️ Journal of imaging informatics in medicine
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical AI models frequently exploit spurious correlations—such as imaging artifacts or background textures—rather than clinically relevant pathological features, severely compromising generalizability. This work systematically evaluates the reliance of CNNs on anatomical regions of interest (ROIs) in chest X-ray (PadChest) and fundus image (Chákṣu) classification tasks. Using precise ROI masking, we demonstrate that state-of-the-art models retain AUC scores significantly above chance even when all pathologically relevant regions are entirely occluded; remarkably, some models achieve higher performance on non-ROI subimages alone than on ROI subimages. Integrating SHAP-based interpretability, embedding-space similarity analysis, and expert radiologist validation, we identify for the first time the phenomenon of “lesion-agnostic robustness” in medical AI. Based on this finding, we propose a novel trustworthiness evaluation framework grounded in three orthogonal dimensions: interpretability, region sensitivity, and clinical plausibility.

Technology Category

Application Category

📝 Abstract
The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an area under the curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chákṣu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge.
Problem

Research questions and friction points this paper is trying to address.

Evaluates CNN sensitivity to non-relevant chest X-ray regions
Identifies spurious correlations in medical image datasets
Assesses model reliance on clinically irrelevant image features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Masking clinically relevant image parts for validation
Using SHAP for model explainability analysis
Analyzing embeddings to detect spurious correlations
🔎 Similar Papers
No similar papers found.
Théo Sourget
Théo Sourget
PhD Student, PURRlab, IT University of Copenhagen
Deep LearningMedical Image AnalysisFairnessOpen ScienceMeta-research
M
Michelle Hestbek-Moller
IT University of Copenhagen, Denmark
A
Amelia Jim'enez-S'anchez
IT University of Copenhagen, Denmark
J
Jack Junchi Xu
Copenhagen University Hospital, Herlev and Gentofte, Denmark; Radiological AI Testcenter, Denmark
V
V. Cheplygina
IT University of Copenhagen, Denmark