🤖 AI Summary
Achieving high-accuracy, interpretable classification of brain tumor MRI images remains challenging due to the need for both diagnostic precision and clinical transparency.
Method: This paper proposes a hybrid framework integrating deep learning with clinical knowledge: (1) a soft-voting ensemble of MobileNetV2 and DenseNet121; (2) Grad-CAM++ for lesion-localization heatmaps; and (3) a Clinical Decision Rule Overlay (CDRO) mechanism that enforces consistency between model predictions and medical prior knowledge.
Contribution/Results: Evaluated via five-fold stratified cross-validation on glioma, meningioma, and pituitary adenoma classification, the framework achieves 91.7% accuracy and 91.6% F1-score. Grad-CAM++ localization attains a Dice coefficient of 0.88 against expert annotations. Five radiologists rated the interpretability’s clinical utility at a mean of 4.4/5. The approach significantly enhances model transparency, clinical trustworthiness, and real-world deployability.
📝 Abstract
Accurate and interpretable classification of brain tumors from magnetic resonance imaging (MRI) is critical for effective diagnosis and treatment planning. This study presents an ensemble-based deep learning framework that combines MobileNetV2 and DenseNet121 convolutional neural networks (CNNs) using a soft voting strategy to classify three common brain tumor types: glioma, meningioma, and pituitary adenoma. The models were trained and evaluated on the Figshare dataset using a stratified 5-fold cross-validation protocol. To enhance transparency and clinical trust, the framework integrates an Explainable AI (XAI) module employing Grad-CAM++ for class-specific saliency visualization, alongside a symbolic Clinical Decision Rule Overlay (CDRO) that maps predictions to established radiological heuristics. The ensemble classifier achieved superior performance compared to individual CNNs, with an accuracy of 91.7%, precision of 91.9%, recall of 91.7%, and F1-score of 91.6%. Grad-CAM++ visualizations revealed strong spatial alignment between model attention and expert-annotated tumor regions, supported by Dice coefficients up to 0.88 and IoU scores up to 0.78. Clinical rule activation further validated model predictions in cases with distinct morphological features. A human-centered interpretability assessment involving five board-certified radiologists yielded high Likert-scale scores for both explanation usefulness (mean = 4.4) and heatmap-region correspondence (mean = 4.0), reinforcing the framework's clinical relevance. Overall, the proposed approach offers a robust, interpretable, and generalizable solution for automated brain tumor classification, advancing the integration of deep learning into clinical neurodiagnostics.