Accurate estimation of feature importance faithfulness for tree models

📅 2024-04-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating the prediction fidelity of feature importance rankings for tree-based models remains challenging, particularly due to reliance on Monte Carlo sampling and distributional assumptions. Method: This paper introduces PGI²—a closed-form, analytical fidelity metric for tree models that supports arbitrary independent feature perturbation distributions without sampling. We derive the first exact analytical solution for PGI² by integrating probabilistic analysis, tree-structure traversal, and perturbation sensitivity modeling, yielding deterministic, zero-variance, and zero-sampling-error importance estimates. Contribution/Results: Leveraging PGI², we propose a globally consistent feature importance ranking method. On multiple regression benchmarks, it significantly outperforms SHAP: rankings exhibit higher stability and more accurately capture each feature’s global influence on model predictions. The core innovation lies in elevating fidelity evaluation from stochastic approximation to analytically tractable computation—achieving both theoretical rigor and computational efficiency.

Technology Category

Application Category

📝 Abstract
In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been typically used for computing similar metrics and which is inherently prone to inaccuracies. Moreover, we propose a method of ranking features by their importance for the tree model's predictions based on PGI squared. Our experiments indicate that in some respects, the method may identify the globally important features better than the state-of-the-art SHAP explainer
Problem

Research questions and friction points this paper is trying to address.

Accurate feature importance estimation
Efficient perturbation-based metric computation
Improved feature ranking method
Innovation

Methods, ideas, or system contributions that make the work stand out.

PGI squared metric
efficient feature perturbation
improved feature ranking
🔎 Similar Papers
No similar papers found.
M
Mateusz Gajewski
Faculty of Computing and Telecommunications, Poznan University of Technology, Poznan, Poland; IDEAS NCBR; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Adam Karczmarz
Adam Karczmarz
University of Warsaw
graph algorithmsdata structures
M
Mateusz Rapicki
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
P
Piotr Sankowski
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland; IDEAS NCBR