Explanation Multiplicity in SHAP: Characterization and Assessment

📅 2026-01-19

📈 Citations: 1

✨ Influential: 1

🤖 AI Summary

This study addresses the troubling inconsistency in feature attribution methods such as SHAP, which can yield substantially divergent explanations even for identical inputs and models, thereby undermining trustworthiness and auditability in high-stakes applications. The work formally defines and quantifies the phenomenon of “explanation multiplicity,” distinguishing its origins in model training or selection from inherent randomness in the explanation procedure itself. To assess stability, the authors introduce a dual-perspective metric incorporating both feature magnitude and ranking, establish a randomized null model as an interpretable baseline, and develop a comprehensive empirical evaluation framework spanning diverse datasets and model classes. Experiments demonstrate that explanation multiplicity is pervasive; relying solely on SHAP value magnitudes can lead to misleading conclusions, necessitating rank-sensitive metrics and principled baselines for reliable interpretability assessment.

Technology Category

Application Category

📝 Abstract

Post-hoc explanations are widely used to justify, contest, and review automated decisions in high-stakes domains such as lending, employment, and healthcare. Among these methods, SHAP is often treated as providing a reliable account of which features mattered for an individual prediction and is routinely used to support recourse, oversight, and accountability. In practice, however, SHAP explanations can differ substantially across repeated runs, even when the individual, prediction task, and trained model are held fixed. We conceptualize and name this phenomenon explanation multiplicity: the existence of multiple, internally valid but substantively different explanations for the same decision. Explanation multiplicity poses a normative challenge for responsible AI deployment, as it undermines expectations that explanations can reliably identify the reasons for an adverse outcome. We present a comprehensive methodology for characterizing explanation multiplicity in post-hoc feature attribution methods, disentangling sources arising from model training and selection versus stochasticity intrinsic to the explanation pipeline. Furthermore, whether explanation multiplicity is surfaced depends on how explanation consistency is measured. Commonly used magnitude-based metrics can suggest stability while masking substantial instability in the identity and ordering of top-ranked features. To contextualize observed instability, we derive and estimate randomized baseline values under plausible null models, providing a principled reference point for interpreting explanation disagreement. Across datasets, model classes, and confidence regimes, we find that explanation multiplicity is widespread and persists even under highly controlled conditions, including high-confidence predictions. Thus explanation practices must be evaluated using metrics and baselines aligned with their intended societal role.

Problem

Research questions and friction points this paper is trying to address.

explanation multiplicity

SHAP

feature attribution

post-hoc explanations

explanation stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

explanation multiplicity

SHAP

feature attribution

post-hoc explanation

stability metrics

🔎 Similar Papers

No similar papers found.

Authors to Follow