ShaplEIG: Bayesian Experimental Design for Shapley Value Estimation

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This work addresses the challenge of efficiently estimating Shapley values in settings where coalition evaluations are computationally expensive and severely budget-constrained. It introduces, for the first time, Bayesian experimental design to this problem by proposing an adaptive sampling method that leverages a Gaussian process surrogate model. The approach strategically selects the most informative coalitions for evaluation by maximizing expected information gain. By exploiting the linearity of Shapley values and properties of elementary symmetric polynomials, the method reduces the computational complexity from exponential to polynomial in the number of players. Empirical results demonstrate that under tight evaluation budgets, the proposed algorithm substantially outperforms existing baselines across multiple high-cost application scenarios, achieving markedly higher sample efficiency.
📝 Abstract
Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions for evaluation based on previous evaluations. This is particularly relevant in settings where the value function is costly and the number of evaluations is severely limited, such as retraining-based feature importance, data valuation, and hyperparameter importance. For this purpose, we propose ShaplEIG, a Bayesian experimental design approach that approximates the expensive value function using a Gaussian process surrogate and adaptively selects coalitions based on their expected information gain about the Shapley values. By the linearity of the Shapley values in the value function, we show that the expected information gain is available in closed form. Furthermore, we propose an efficient computation scheme that reduces the complexity from exponential to polynomial in the number of players via elementary symmetric polynomials. In extensive experiments across diverse costly applications, our method consistently improves sample efficiency in the low-budget regime over state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Shapley values
Bayesian experimental design
value function evaluation
sample efficiency
costly evaluations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian experimental design
Shapley value estimation
Gaussian process surrogate
expected information gain
elementary symmetric polynomials