A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In breast cancer screening, subtle imaging findings and diagnostic ambiguity limit the clinical applicability of single-view, single-task AI models. To address this, we propose a multi-view, multi-task deep learning framework that jointly predicts bilateral breast diagnostic labels and BI-RADS scores (2-/3-/5-class classification). Our method introduces a novel CNN–VSSM hybrid backbone network to jointly capture local texture and global contextual features, and a gated attention fusion module for dynamic, weighted integration of four mammographic views—including robust inference under missing-view conditions. Evaluated on clinical mammography data, the model achieves an AUC of 0.9967 and F1-score of 0.9830 on BI-RADS binary classification, 0.7790 on three-class, and 0.4904 (best among five classes) on five-class classification—substantially outperforming existing baselines. The framework demonstrates high accuracy, strong robustness to view occlusion or absence, and enhanced interpretability.

Technology Category

Application Category

📝 Abstract
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection, yet it remains a complex challenge due to subtle imaging findings and diagnostic ambiguity. Many existing AI approaches fall short by focusing on single view inputs or single-task outputs, limiting their clinical utility. To address these limitations, we propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views and jointly predicts diagnostic labels and BI-RADS scores for each breast. Our architecture integrates a hybrid CNN VSSM backbone, combining convolutional encoders for rich local feature extraction with Visual State Space Models (VSSMs) to capture global contextual dependencies. To improve robustness and interpretability, we incorporate a gated attention-based fusion module that dynamically weights information across views, effectively handling cases with missing data. We conduct extensive experiments across diagnostic tasks of varying complexity, benchmarking our proposed hybrid models against baseline CNN architectures and VSSM models in both single task and multi task learning settings. Across all tasks, the hybrid models consistently outperform the baselines. In the binary BI-RADS 1 vs. 5 classification task, the shared hybrid model achieves an AUC of 0.9967 and an F1 score of 0.9830. For the more challenging ternary classification, it attains an F1 score of 0.7790, while in the five-class BI-RADS task, the best F1 score reaches 0.4904. These results highlight the effectiveness of the proposed hybrid framework and underscore both the potential and limitations of multitask learning for improving diagnostic performance and enabling clinically meaningful mammography analysis.
Problem

Research questions and friction points this paper is trying to address.

Improves breast cancer detection via multi-view mammography analysis
Addresses limitations of single-view, single-task AI approaches
Enhances robustness with attention-based fusion for missing data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN-VSSM model for multi-view analysis
Attention-based fusion for dynamic weighting
Joint prediction of labels and BI-RADS scores
🔎 Similar Papers
No similar papers found.
Y
Yalda Zafari
Department of Mathematics and Statistics, Qatar University, Doha, Qatar
R
Roaa Elalfy
Department of Mathematics and Statistics, Qatar University, Doha, Qatar
M
Mohamed Mabrok
Department of Mathematics and Statistics, Qatar University, Doha, Qatar
S
Somaya Al-Maadeed
Department of Computer Science and Engineering, Qatar University, Doha, Qatar
Tamer Khattab
Tamer Khattab
Professor of Electrical Engineering, Qatar University
Communication and Information TheoryWireless SecurityMachine Learning in Communications.
E
Essam A. Rashed
Graduate School of Information Science, University of Hyogo, Kobe 650-0047, Japan; Advanced Medical Engineering Research Institute, University of Hyogo, Himeji, 670-0836, Japan