🤖 AI Summary
This study addresses the problem of determining the minimal clinically feasible dataset size required to train MRI-based AI models for breast cancer classification. Moving beyond conventional image-count–based evaluations, we propose a patient-level data requirements analysis framework and introduce the novel metric “effective patient count.” Using a multicenter MRI dataset, our methodology integrates few-shot learning, cross-site robustness evaluation, and uncertainty-driven data importance ranking to quantitatively assess how dataset size, lesion diversity, and annotation quality impact model generalizability. Results demonstrate that only 80–120 high-quality, expert-annotated patients suffice for models to achieve >92% AUC on external multi-institutional validation—substantially lowering the data acquisition barrier for clinical deployment. Our core contribution is the establishment of a reproducible, patient-centric data efficiency evaluation paradigm, providing empirically grounded guidance for data curation and resource planning in medical imaging AI.