Statistical Significance of Feature Importance Rankings

📅 2024-01-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the insufficient statistical stability of random-sampling-based explanation methods (e.g., SHAP, LIME) in feature importance ranking, this paper proposes the first framework that provides *verifiable* statistical significance guarantees for both Top-K feature selection and their strict ordering. Our method introduces: (1) a high-confidence (>1−α) identification mechanism grounded in rigorous hypothesis testing; and (2) two efficient adaptive sampling algorithms, each with theoretical guarantees to exactly recover the top-K features and their correct order. By integrating stochastic sampling theory with confidence set construction, our approach ensures robustness without sacrificing computational efficiency. Empirical evaluation across multiple benchmark datasets demonstrates substantial improvements in ranking stability—reducing average rank volatility by 42%–68% versus baseline methods—while maintaining competitive runtime. This work establishes a statistically principled foundation for trustworthy attribution and robust model interpretation.

Technology Category

Application Category

📝 Abstract

Feature importance scores are ubiquitous tools for understanding the predictions of machine learning models. However, many popular attribution methods suffer from high instability due to random sampling. Leveraging novel ideas from hypothesis testing, we devise techniques that ensure the most important features are correct with high-probability guarantees. These assess the set of $K$ top-ranked features, as well as the order of its elements. Given a set of local or global importance scores, we demonstrate how to retrospectively verify the stability of the highest ranks. We then introduce two efficient sampling algorithms that identify the $K$ most important features, perhaps in order, with probability exceeding $1-alpha$. The theoretical justification for these procedures is validated empirically on SHAP and LIME.

Problem

Research questions and friction points this paper is trying to address.

Feature Importance Stability

Statistical Stability

SHAP LIME

Innovation

Methods, ideas, or system contributions that make the work stand out.

Statistical Method

Feature Importance

Machine Learning Interpretability

🔎 Similar Papers

Spurious Correlations in Machine Learning: A Survey