🤖 AI Summary
Existing model-interpretation tools neglect feature dependencies, leading to biased Shapley value estimates. This paper introduces *shapr*—the first open-source, cross-language (R/Python) toolkit unifying conditional, causal, and asymmetric Shapley value computation. Methodologically, it achieves high-accuracy conditional Shapley estimation via conditional distribution modeling and kernel-based approximation; introduces the first R-native implementation for time-series attribution; and proposes a convergence-aware iterative algorithm integrated with a parallel computing framework, yielding up to 8× speedup empirically. *shapr* seamlessly interfaces with mainstream libraries—including XGBoost, scikit-learn, and forecast—enhancing explanation fidelity in settings with highly correlated features. The R package is available on CRAN, and the Python counterpart (*shaprpy*) is hosted on PyPI. Both include built-in visualization support (via ggplot2 and matplotlib) and native integration with causal graphs.
📝 Abstract
This paper introduces the shapr package, a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. The package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies, which is crucial for correct model interpretation and lacking in similar software. In addition to regular tabular data, the shapr R-package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible defaults for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. In addition, we introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem. Overall, the package aims to enhance the interpretability of predictive models within a powerful and user-friendly framework.