MamaDino: A Hybrid Vision Model for Breast Cancer 3-Year Risk Prediction

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes MamaDino, a novel approach for breast cancer risk prediction that addresses the limitations of existing models, which rely on high-resolution mammograms and fail to explicitly model bilateral breast asymmetry—leading to performance degradation at lower resolutions. MamaDino uniquely integrates a self-supervised DINOv2 Vision Transformer with a trainable CNN encoder and introduces a BilateralMixer module to explicitly capture asymmetry between left and right breasts. Operating effectively at a reduced resolution of 512×512 (approximately 13× fewer pixels than standard inputs), the method leverages the complementary inductive biases of convolutional and transformer architectures. Evaluated on both internal and external test sets, MamaDino achieves AUCs up to 0.736, matching the performance of the current state-of-the-art model Mirai, while demonstrating robustness across diverse populations and imaging devices.

Technology Category

Application Category

📝 Abstract
Breast cancer screening programmes increasingly seek to move from one-size-fits-all interval to risk-adapted and personalized strategies. Deep learning (DL) has enabled image-based risk models with stronger 1- to 5-year prediction than traditional clinical models, but leading systems (e.g., Mirai) typically use convolutional backbones, very high-resolution inputs (>1M pixels) and simple multi-view fusion, with limited explicit modelling of contralateral asymmetry. We hypothesised that combining complementary inductive biases (convolutional and transformer-based) with explicit contralateral asymmetry modelling would allow us to match state-of-the-art 3-year risk prediction performance even when operating on substantially lower-resolution mammograms, indicating that using less detailed images in a more structured way can recover state-of-the-art accuracy. We present MamaDino, a mammography-aware multi-view attentional DINO model. MamaDino fuses frozen self-supervised DINOv3 ViT-S features with a trainable CNN encoder at 512x512 resolution, and aggregates bilateral breast information via a BilateralMixer to output a 3-year breast cancer risk score. We train on 53,883 women from OPTIMAM (UK) and evaluate on matched 3-year case-control cohorts: an in-distribution test set from four screening sites and an external out-of-distribution cohort from an unseen site. At breast-level, MamaDino matches Mirai on both internal and external tests while using ~13x fewer input pixels. Adding the BilateralMixer improves discrimination to AUC 0.736 (vs 0.713) in-distribution and 0.677 (vs 0.666) out-of-distribution, with consistent performance across age, ethnicity, scanner, tumour type and grade. These findings demonstrate that explicit contralateral modelling and complementary inductive biases enable predictions that match Mirai, despite operating on substantially lower-resolution mammograms.
Problem

Research questions and friction points this paper is trying to address.

breast cancer risk prediction
low-resolution mammography
contralateral asymmetry
personalized screening
deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid vision model
contralateral asymmetry modeling
self-supervised DINO features
low-resolution mammography
BilateralMixer
🔎 Similar Papers
No similar papers found.
R
Ruggiero Santeramo
Fondazione Human Technopole, Milan, Italy
I
Igor Zubarev
Fondazione Human Technopole, Milan, Italy
Florian Jug
Florian Jug
Fondazione Human Technopole
Computational MicroscopyComputational BiologyAIMachine LearningComputational Imaging