FLEX: Feature Importance from Layered Counterfactual Explanations

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Machine learning models are difficult to deploy safely in high-stakes applications due to their lack of interpretability. Existing counterfactual explanations operate at the instance level only, failing to systematically characterize how features drive model decisions locally or globally. This paper introduces FLEX, the first framework enabling consistent feature importance quantification across instance-, local-, regional-, and global-level granularities. FLEX leverages counterfactual generation coupled with neighborhood aggregation to compute feature perturbation frequencies, supports customizable actionable constraints, and uncovers context-specific drivers overlooked by conventional methods. Compatible with diverse counterfactual algorithms, FLEX is rigorously validated against SHAP and LIME. In traffic accident severity prediction and loan approval tasks, its global rankings exhibit strong agreement with SHAP, while regional analysis identifies critical contextual factors missed by global approaches—demonstrating both fidelity and enhanced diagnostic capability.

Technology Category

Application Category

📝 Abstract

Machine learning models achieve state-of-the-art performance across domains, yet their lack of interpretability limits safe deployment in high-stakes settings. Counterfactual explanations are widely used to provide actionable "what-if" recourse, but they typically remain instance-specific and do not quantify which features systematically drive outcome changes within coherent regions of the feature space or across an entire dataset. We introduce FLEX (Feature importance from Layered counterfactual EXplanations), a model- and domain-agnostic framework that converts sets of counterfactuals into feature change frequency scores at local, regional, and global levels. FLEX generalises local change-frequency measures by aggregating across instances and neighbourhoods, offering interpretable rankings that reflect how often each feature must change to flip predictions. The framework is compatible with different counterfactual generation methods, allowing users to emphasise characteristics such as sparsity, feasibility, or actionability, thereby tailoring the derived feature importances to practical constraints. We evaluate FLEX on two contrasting tabular tasks: traffic accident severity prediction and loan approval, and compare FLEX to SHAP- and LIME-derived feature importance values. Results show that (i) FLEX's global rankings correlate with SHAP while surfacing additional drivers, and (ii) regional analyses reveal context-specific factors that global summaries miss. FLEX thus bridges the gap between local recourse and global attribution, supporting transparent and intervention-oriented decision-making in risk-sensitive applications.

Problem

Research questions and friction points this paper is trying to address.

Quantifying systematic feature importance across local, regional, and global levels

Converting counterfactual explanations into interpretable feature change frequency scores

Bridging the gap between instance-specific recourse and dataset-wide feature attribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts counterfactuals into feature change frequency scores

Aggregates feature importance across local regional global levels

Compatible with various counterfactual generation methods

🔎 Similar Papers

No similar papers found.

Authors to Follow