🤖 AI Summary
Existing counterfactual explanation methods often neglect genuine feature dependencies in data, yielding infeasible or non-actionable counterfactuals. To address this, we propose DANCE—a data-aware counterfactual generation framework that jointly incorporates data-driven feature dependencies and domain-specific knowledge graphs. DANCE explicitly models structural feature dependencies via linear and nonlinear causal constraints and jointly optimizes for fidelity, diversity, and sparsity. Its core innovation lies in embedding expert-curated knowledge graphs into the counterfactual search process, thereby ensuring generated instances adhere to both causal logic and operational feasibility. Extensive experiments across the real-world Freshmail email marketing scenario and 140 public benchmark datasets demonstrate that DANCE significantly outperforms state-of-the-art methods on key metrics—including feasibility, actionability, and explanation quality—thereby enhancing the practical utility of model explanations in real-world decision-making.
📝 Abstract
Counterfactual explanations enhance the actionable interpretability of machine learning models by identifying the minimal changes required to achieve a desired outcome of the model. However, existing methods often ignore the complex dependencies in real-world datasets, leading to unrealistic or impractical modifications. Motivated by cybersecurity applications in the email marketing domain, we propose a method for generating Diverse, Actionable, and kNowledge-Constrained Explanations (DANCE), which incorporates feature dependencies and causal constraints to ensure plausibility and real-world feasibility of counterfactuals. Our method learns linear and nonlinear constraints from data or integrates expert-provided dependency graphs, ensuring counterfactuals are plausible and actionable. By maintaining consistency with feature relationships, the method produces explanations that align with real-world constraints. Additionally, it balances plausibility, diversity, and sparsity, effectively addressing key limitations in existing algorithms. The work is developed based on a real-life case study with Freshmail, the largest email marketing company in Poland and supported by a joint R&D project Sendguard. Furthermore, we provide an extensive evaluation using 140 public datasets, which highlights its ability to generate meaningful, domain-relevant counterfactuals that outperform other existing approaches based on widely used metrics. The source code for reproduction of the results can be found in a GitHub repository we provide.