shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing model-interpretation tools neglect feature dependencies, leading to biased Shapley value estimates. This paper introduces *shapr*—the first open-source, cross-language (R/Python) toolkit unifying conditional, causal, and asymmetric Shapley value computation. Methodologically, it achieves high-accuracy conditional Shapley estimation via conditional distribution modeling and kernel-based approximation; introduces the first R-native implementation for time-series attribution; and proposes a convergence-aware iterative algorithm integrated with a parallel computing framework, yielding up to 8× speedup empirically. *shapr* seamlessly interfaces with mainstream libraries—including XGBoost, scikit-learn, and forecast—enhancing explanation fidelity in settings with highly correlated features. The R package is available on CRAN, and the Python counterpart (*shaprpy*) is hosted on PyPI. Both include built-in visualization support (via ggplot2 and matplotlib) and native integration with causal graphs.

Technology Category

Application Category

📝 Abstract

This paper introduces the shapr package, a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. The package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies, which is crucial for correct model interpretation and lacking in similar software. In addition to regular tabular data, the shapr R-package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible defaults for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. In addition, we introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem. Overall, the package aims to enhance the interpretability of predictive models within a powerful and user-friendly framework.

Problem

Research questions and friction points this paper is trying to address.

Generating Shapley value explanations for ML models

Accurately capturing feature dependencies for interpretation

Extending functionality to causal and asymmetric Shapley values

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional Shapley values for accurate dependencies

Specialized time series forecast explanations

Parallelized computations with visualization tools

🔎 Similar Papers

Improving the Weighting Strategy in KernelSHAP

2024-10-07Citations: 2

Variational Shapley Network: A Probabilistic Approach to Self-Explaining Shapley values with Uncertainty Quantification

2024-02-06arXiv.orgCitations: 0

Authors to Follow