🤖 AI Summary
Low-level counterfactual explanations (CFEs) suffer from poor real-world actionability in large state spaces. Method: This paper introduces three types of high-level CFEs—hl-continuous, hl-discrete, and hl-id—that shift CFE modeling from the feature level to the action level. We formalize hl-discrete CFEs as a weighted set cover problem and hl-continuous CFEs as an integer linear program; further, we design a data-driven CFE generator equivalent to learning an optimal policy over a family of deterministic large MDPs. Results: Experiments on medical datasets—including BRFSS, Foods, and NHANES—demonstrate that the generator achieves high accuracy with low computational overhead. High-level CFEs significantly improve actionability, interpretability, and real-world feasibility compared to conventional feature-level explanations, establishing a novel paradigm for actionable model interpretation.
📝 Abstract
Recourse generators provide actionable insights, often through feature-based counterfactual explanations (CFEs), to help negatively classified individuals understand how to adjust their input features to achieve a positive classification. These feature-based CFEs, which we refer to as emph{low-level} CFEs, are overly specific (e.g., coding experience: $4 o 5+$ years) and often recommended in feature space that doesn't straightforwardly align with real-world actions. To bridge this gap, we introduce three novel recourse types grounded in real-world actions: high-level continuous (emph{hl-continuous}), high-level discrete (emph{hl-discrete}), and high-level ID (emph{hl-id}) CFEs. We formulate single-agent CFE generation methods, where we model the hl-discrete CFE as a solution to a weighted set cover problem and the hl-continuous CFE as a solution to an integer linear program. Since these methods require costly optimization per agent, we propose data-driven CFE generation approaches that, given instances of agents and their optimal CFEs, learn a CFE generator that quickly provides optimal CFEs for new agents. This approach, also viewed as one of learning an optimal policy in a family of large but deterministic MDPs, considers several problem formulations, including formulations in which the actions and their effects are unknown, and therefore addresses informational and computational challenges. Through extensive empirical evaluation using publicly available healthcare datasets (BRFSS, Foods, and NHANES), we compare the proposed forms of recourse to low-level CFEs and assess the effectiveness of our data-driven approaches. Empirical results show that the proposed data-driven CFE generators are accurate and resource-efficient, and the proposed forms of recourse have various advantages over the low-level CFEs.