Exploring Patient Data Requirements in Training Effective AI Models for MRI-Based Breast Cancer Classification

📅 2025-02-22

🏛️ Deep-Breath@MICCAI

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the problem of determining the minimal clinically feasible dataset size required to train MRI-based AI models for breast cancer classification. Moving beyond conventional image-count–based evaluations, we propose a patient-level data requirements analysis framework and introduce the novel metric “effective patient count.” Using a multicenter MRI dataset, our methodology integrates few-shot learning, cross-site robustness evaluation, and uncertainty-driven data importance ranking to quantitatively assess how dataset size, lesion diversity, and annotation quality impact model generalizability. Results demonstrate that only 80–120 high-quality, expert-annotated patients suffice for models to achieve >92% AUC on external multi-institutional validation—substantially lowering the data acquisition barrier for clinical deployment. Our core contribution is the establishment of a reproducible, patient-centric data efficiency evaluation paradigm, providing empirically grounded guidance for data curation and resource planning in medical imaging AI.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Determine data quantity for AI training

Optimize MRI-based breast cancer detection

Assess impact of patient count on model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes foundation models effectively

Trains AI with minimal MRI data

Enhances performance with simple ensembles

🔎 Similar Papers

No similar papers found.

Authors to Follow