shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model-interpretation tools neglect feature dependencies, leading to biased Shapley value estimates. This paper introduces *shapr*—the first open-source, cross-language (R/Python) toolkit unifying conditional, causal, and asymmetric Shapley value computation. Methodologically, it achieves high-accuracy conditional Shapley estimation via conditional distribution modeling and kernel-based approximation; introduces the first R-native implementation for time-series attribution; and proposes a convergence-aware iterative algorithm integrated with a parallel computing framework, yielding up to 8× speedup empirically. *shapr* seamlessly interfaces with mainstream libraries—including XGBoost, scikit-learn, and forecast—enhancing explanation fidelity in settings with highly correlated features. The R package is available on CRAN, and the Python counterpart (*shaprpy*) is hosted on PyPI. Both include built-in visualization support (via ggplot2 and matplotlib) and native integration with causal graphs.

Technology Category

Application Category

📝 Abstract
This paper introduces the shapr package, a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. The package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies, which is crucial for correct model interpretation and lacking in similar software. In addition to regular tabular data, the shapr R-package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible defaults for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. In addition, we introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem. Overall, the package aims to enhance the interpretability of predictive models within a powerful and user-friendly framework.
Problem

Research questions and friction points this paper is trying to address.

Generating Shapley value explanations for ML models
Accurately capturing feature dependencies for interpretation
Extending functionality to causal and asymmetric Shapley values
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional Shapley values for accurate dependencies
Specialized time series forecast explanations
Parallelized computations with visualization tools
M
Martin Jullum
Norwegian Computing Center, Norway
L
Lars Henry Berge Olsen
University of Oslo, Norway
Jon Lachmann
Jon Lachmann
Stockholm University
Statistics
A
Annabelle Redelmeier
Norwegian Computing Center, Norway