Clinical Interpretability of Deep Learning Segmentation Through Shapley-Derived Agreement and Uncertainty Metrics

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Poor interpretability of medical image segmentation models hinders clinical deployment. To address this, we propose a Shapley-value-based contrast-level attribution method that quantifies the contribution of individual MRI sequences (T1, T2, FLAIR, T2-IR) to deep learning segmentation outputs across four architectures (including U-Net). We introduce, for the first time, a consistency metric and uncertainty quantification between Shapley-derived feature rankings and clinical expert judgments. Using cross-validation and Dice coefficient evaluation, we find that in high-performance cases (Dice > 0.6), Shapley rankings significantly align with clinical consensus (p < 0.01); moreover, ranking variance exhibits a strong negative correlation with model performance (U-Net: r = −0.581), serving as a reliable indicator of model trustworthiness. This framework provides an interpretable, verifiable, and clinically aligned attribution methodology for multimodal medical AI.

Technology Category

Application Category

📝 Abstract
Segmentation is the identification of anatomical regions of interest, such as organs, tissue, and lesions, serving as a fundamental task in computer-aided diagnosis in medical imaging. Although deep learning models have achieved remarkable performance in medical image segmentation, the need for explainability remains critical for ensuring their acceptance and integration in clinical practice, despite the growing research attention in this area. Our approach explored the use of contrast-level Shapley values, a systematic perturbation of model inputs to assess feature importance. While other studies have investigated gradient-based techniques through identifying influential regions in imaging inputs, Shapley values offer a broader, clinically aligned approach, explaining how model performance is fairly attributed to certain imaging contrasts over others. Using the BraTS 2024 dataset, we generated rankings for Shapley values for four MRI contrasts across four model architectures. Two metrics were proposed from the Shapley ranking: agreement between model and ``clinician" imaging ranking, and uncertainty quantified through Shapley ranking variance across cross-validation folds. Higher-performing cases (Dice extgreater0.6) showed significantly greater agreement with clinical rankings. Increased Shapley ranking variance correlated with decreased performance (U-Net: $r=-0.581$). These metrics provide clinically interpretable proxies for model reliability, helping clinicians better understand state-of-the-art segmentation models.
Problem

Research questions and friction points this paper is trying to address.

Enhances clinical interpretability of deep learning segmentation models
Assesses feature importance using Shapley-derived agreement and uncertainty metrics
Provides proxies for model reliability to aid clinical acceptance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Shapley values to explain deep learning segmentation
Proposing agreement and uncertainty metrics for clinical interpretability
Applying metrics to MRI contrasts across multiple model architectures
🔎 Similar Papers
No similar papers found.
T
Tianyi Ren
Department of Mechanical Engineering, University of Washington, Seattle, WA
D
Daniel Low
University of Washington School of Medicine, Seattle, WA
P
Pittra Jaengprajak
University of Washington School of Medicine, Seattle, WA
J
Juampablo Heras Rivera
Department of Mechanical Engineering, University of Washington, Seattle, WA
J
Jacob Ruzevick
Department of Neurological Surgery, University of Washington, Seattle, WA
Mehmet Kurt
Mehmet Kurt
University of Washington
Deep Learning in Medical ImagingTraumatic Brain InjuryBrain BiomechanicsMR ElastographyMechanical Neuroimaging