Deep Modeling and Interpretation for Bladder Cancer Classification

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of current deep learning models in bladder cancer medical image classification, which stem from small lesion proportions, poor calibration, and limited interpretability. For the first time, it systematically evaluates 13 prominent CNN and Transformer architectures—including ConvNeXt, Vision Transformer (ViT), and Swin Transformer—across classification accuracy, calibration reliability, and interpretability on a multi-center bladder cancer dataset. Interpretability is enhanced through Grad-CAM++ and test-time augmentation. Based on approximately 300 experiments, ViT variants demonstrate superior calibration performance, while ConvNeXt achieves the highest in-distribution accuracy (~60%). ViT also exhibits stronger interpretability on out-of-distribution samples. No single model excels across all criteria, underscoring the importance of task-specific model selection.

Technology Category

Application Category

📝 Abstract
Deep models based on vision transformer (ViT) and convolutional neural network (CNN) have demonstrated remarkable performance on natural datasets. However, these models may not be similar in medical imaging, where abnormal regions cover only a small portion of the image. This challenge motivates this study to investigate the latest deep models for bladder cancer classification tasks. We propose the following to evaluate these deep models: 1) standard classification using 13 models (four CNNs and eight transormer-based models), 2) calibration analysis to examine if these models are well calibrated for bladder cancer classification, and 3) we use GradCAM++ to evaluate the interpretability of these models for clinical diagnosis. We simulate $\sim 300$ experiments on a publicly multicenter bladder cancer dataset, and the experimental results demonstrate that the ConvNext series indicate limited generalization ability to classify bladder cancer images (e.g., $\sim 60\%$ accuracy). In addition, ViTs show better calibration effects compared to ConvNext and swin transformer series. We also involve test time augmentation to improve the models interpretability. Finally, no model provides a one-size-fits-all solution for a feasible interpretable model. ConvNext series are suitable for in-distribution samples, while ViT and its variants are suitable for interpreting out-of-distribution samples.
Problem

Research questions and friction points this paper is trying to address.

bladder cancer classification
medical image analysis
model calibration
interpretability
deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer
Calibration Analysis
GradCAM++
Test Time Augmentation
Bladder Cancer Classification
🔎 Similar Papers
No similar papers found.
Ahmad Chaddad
Ahmad Chaddad
Professor @ School of Artificial Intelligence, GUET; LIVIA-ETS
Artificial intelligenceradiomic and radio-genomicsSignal & Image ProcessingElectrical & Electronic System
Y
Yihang Wu
1Laboratory for AIPM, School of Artificial Intelligence, Guilin University of Electronic Technology, China
X
Xianrui Chen
1Laboratory for AIPM, School of Artificial Intelligence, Guilin University of Electronic Technology, China