Imputation Uncertainty in Interpretable Machine Learning Methods

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Missing values are pervasive in real-world data, yet existing interpretable machine learning (IML) methods commonly rely on single imputation, neglecting how imputation-induced uncertainty affects explanation stability—particularly confidence interval coverage probabilities. This work systematically quantifies the impact of imputation strategies on the coverage of confidence intervals for three canonical IML methods: permutation importance, partial dependence plots, and Shapley values. We compare single imputation (mean, median, regression) against multiple imputation (MICE), integrating permutation testing and nonparametric confidence interval estimation. Results show that single imputation severely underestimates variance, yielding actual 95% confidence interval coverage rates frequently below 80%. In contrast, multiple imputation restores coverage close to the nominal level across most settings, substantially enhancing the statistical reliability of IML explanations. To our knowledge, this is the first study to rigorously evaluate and quantify how imputation choices affect the inferential validity of IML outputs.

Technology Category

Application Category

📝 Abstract

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.

Problem

Research questions and friction points this paper is trying to address.

Evaluates imputation uncertainty effects on IML confidence intervals

Compares single vs multiple imputation for variance estimation accuracy

Analyzes coverage probabilities for feature importance and Shapley values

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple imputation reduces variance underestimation in IML

Compares imputation effects on confidence intervals for IML methods

Shows single imputation often fails to achieve nominal coverage

🔎 Similar Papers

Explainability of Machine Learning Models under Missing Data