The Effect of Data Poisoning on Counterfactual Explanations

📅 2024-02-13

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This work formally defines and systematically investigates data poisoning attacks against counterfactual explanations (CEs), revealing their mechanisms for significantly inflating algorithmic redress costs at instance-, subgroup-, and population-levels. We propose theoretically grounded, provably correct poisoning strategies and conduct adversarial injection experiments on state-of-the-art CE generators—including DiCE and CFProto—complemented by a multi-level cost measurement framework. Results demonstrate that current SOTA CE methods exhibit systemic vulnerabilities: poisoning induces substantial increases in counterfactual path length or complete failure of redress generation. Our contribution is twofold: (1) it provides a critical security alert regarding the robustness of explainable AI systems; and (2) it establishes the first principled analytical paradigm for data poisoning targeting counterfactual explanations—thereby advancing rigorous robustness evaluation and defense research for trustworthy AI.

Technology Category

Application Category

📝 Abstract

Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e. more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formally introduce and investigate data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. In this context, we characterize and prove the correctness of several different data poisonings. We also empirically demonstrate that state-of-the-art counterfactual generation methods and toolboxes are vulnerable to such data poisoning.

Problem

Research questions and friction points this paper is trying to address.

Studies vulnerability of counterfactual explanations to data poisoning

Examines increased recourse cost locally, subgroup-wise, and globally

Assesses failure of defense methods against poisonous samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Studying data poisoning vulnerability in counterfactual explanations

Introducing general data poisoning mechanism for recourse cost

Demonstrating poisoning impact in water network event detection

🔎 Similar Papers

No similar papers found.

Authors to Follow