SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

In medical image classification, deep neural networks often exploit non-clinical spurious correlations—such as magnetic field strength and scan orientation—undermining generalizability and clinical reliability. To address the lack of controllable bias signals in existing datasets, we introduce the first breast MRI dataset explicitly designed for spurious correlation research: it incorporates two tunable, non-clinical biases (field strength and acquisition plane) and provides a rigorously matched, bias-free reference subset. Leveraging multi-center data and a controlled train/val/test split, we disentangle global and local spurious signals. Experiments demonstrate that models achieve high validation accuracy by leveraging spurious features but suffer substantial performance degradation on the bias-free test set—highlighting critical risks for clinical deployment. The dataset and code are publicly released, establishing a new benchmark for studying generalization, uncertainty quantification, and robustness in trustworthy AI for medical imaging.

Technology Category

Application Category

📝 Abstract

Deep neural networks (DNNs) have demonstrated remarkable success in medical imaging, yet their real-world deployment remains challenging due to spurious correlations, where models can learn non-clinical features instead of meaningful medical patterns. Existing medical imaging datasets are not designed to systematically study this issue, largely due to restrictive licensing and limited supplementary patient data. To address this gap, we introduce SpurBreast, a curated breast MRI dataset that intentionally incorporates spurious correlations to evaluate their impact on model performance. Analyzing over 100 features involving patient, device, and imaging protocol, we identify two dominant spurious signals: magnetic field strength (a global feature influencing the entire image) and image orientation (a local feature affecting spatial alignment). Through controlled dataset splits, we demonstrate that DNNs can exploit these non-clinical signals, achieving high validation accuracy while failing to generalize to unbiased test data. Alongside these two datasets containing spurious correlations, we also provide benchmark datasets without spurious correlations, allowing researchers to systematically investigate clinically relevant and irrelevant features, uncertainty estimation, adversarial robustness, and generalization strategies. Models and datasets are available at https://github.com/utkuozbulak/spurbreast.

Problem

Research questions and friction points this paper is trying to address.

Addressing spurious correlations in breast MRI classification using deep neural networks

Identifying magnetic field strength and image orientation as dominant spurious signals

Providing benchmark datasets to study generalization failures in medical imaging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated breast MRI dataset with spurious correlations

Identified magnetic field strength and image orientation signals

Provided benchmark datasets without spurious correlations

🔎 Similar Papers

Spurious Correlations in Machine Learning: A Survey