🤖 AI Summary
To address the instability of SHAP explanations in medical image recognition—arising from both epistemic and aleatoric uncertainty—this paper proposes the first XAI framework that quantifies SHAP explanation uncertainty. Methodologically, it innovatively integrates Dirichlet posterior sampling with Dempster–Shafer evidence theory to construct three complementary visualizations: belief maps, plausibility maps, and fused maps, enabling verifiable reliability assessment of SHAP attributions. The framework incorporates SHAP value stability modeling and statistical analysis across diverse multimodal medical imaging modalities—including histopathology, ophthalmology, and radiology. Evaluated on three heterogeneous clinical datasets, the approach reduces explanation calibration error (ECE) by 37% and achieves an average clinical expert satisfaction score of 4.8/5.0 regarding explanation stability—demonstrating its efficacy in supporting trustworthy clinical decision-making.
📝 Abstract
Recent advances in deep learning have led to its widespread adoption across diverse domains, including medical imaging. This progress is driven by increasingly sophisticated model architectures, such as ResNets, Vision Transformers, and Hybrid Convolutional Neural Networks, that offer enhanced performance at the cost of greater complexity. This complexity often compromises model explainability and interpretability. SHAP has emerged as a prominent method for providing interpretable visualizations that aid domain experts in understanding model predictions. However, SHAP explanations can be unstable and unreliable in the presence of epistemic and aleatoric uncertainty. In this study, we address this challenge by using Dirichlet posterior sampling and Dempster-Shafer theory to quantify the uncertainty that arises from these unstable explanations in medical imaging applications. The framework uses a belief, plausible, and fusion map approach alongside statistical quantitative analysis to produce quantification of uncertainty in SHAP. Furthermore, we evaluated our framework on three medical imaging datasets with varying class distributions, image qualities, and modality types which introduces noise due to varying image resolutions and modality-specific aspect covering the examples from pathology, ophthalmology, and radiology, introducing significant epistemic uncertainty.