🤖 AI Summary
In breast cancer screening, subtle imaging findings and diagnostic ambiguity limit the clinical applicability of single-view, single-task AI models. To address this, we propose a multi-view, multi-task deep learning framework that jointly predicts bilateral breast diagnostic labels and BI-RADS scores (2-/3-/5-class classification). Our method introduces a novel CNN–VSSM hybrid backbone network to jointly capture local texture and global contextual features, and a gated attention fusion module for dynamic, weighted integration of four mammographic views—including robust inference under missing-view conditions. Evaluated on clinical mammography data, the model achieves an AUC of 0.9967 and F1-score of 0.9830 on BI-RADS binary classification, 0.7790 on three-class, and 0.4904 (best among five classes) on five-class classification—substantially outperforming existing baselines. The framework demonstrates high accuracy, strong robustness to view occlusion or absence, and enhanced interpretability.
📝 Abstract
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection, yet it remains a complex challenge due to subtle imaging findings and diagnostic ambiguity. Many existing AI approaches fall short by focusing on single view inputs or single-task outputs, limiting their clinical utility. To address these limitations, we propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views and jointly predicts diagnostic labels and BI-RADS scores for each breast. Our architecture integrates a hybrid CNN VSSM backbone, combining convolutional encoders for rich local feature extraction with Visual State Space Models (VSSMs) to capture global contextual dependencies. To improve robustness and interpretability, we incorporate a gated attention-based fusion module that dynamically weights information across views, effectively handling cases with missing data. We conduct extensive experiments across diagnostic tasks of varying complexity, benchmarking our proposed hybrid models against baseline CNN architectures and VSSM models in both single task and multi task learning settings. Across all tasks, the hybrid models consistently outperform the baselines. In the binary BI-RADS 1 vs. 5 classification task, the shared hybrid model achieves an AUC of 0.9967 and an F1 score of 0.9830. For the more challenging ternary classification, it attains an F1 score of 0.7790, while in the five-class BI-RADS task, the best F1 score reaches 0.4904. These results highlight the effectiveness of the proposed hybrid framework and underscore both the potential and limitations of multitask learning for improving diagnostic performance and enabling clinically meaningful mammography analysis.