Quantifying the Privacy of Counterfactuals by Leveraging Membership Inference Attacks Against Synthetic Data

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Although counterfactual explanations enhance model interpretability, they may inadvertently leak private information about training data. This work introduces a novel privacy attack paradigm by adapting membership inference attacks from the synthetic data domain to the counterfactual setting. The proposed method requires no access to the original model and operates solely on a set of generated counterfactual examples. By integrating counterfactual generation, membership inference, and privacy risk quantification, the study demonstrates that an adversary can effectively determine whether a given individual was part of the training dataset. These findings reveal that publicly releasing counterfactual explanations entails significant privacy risks, challenging the assumption that such post-hoc explanations are inherently safe.

📝 Abstract

Counterfactuals are typically used in high-stakes decision areas to explain a machine learning model by showing how changes to the user profiles result in the desired outcome. However, explaining the model's decisions through counterfactuals can also be exploited by an adversary to conduct privacy attacks against the model or its training data. Drawing on the analogy that counterfactuals provide realistic substitutes for real training data, similar to synthetic data, we demonstrate in this paper how it is possible to successfully perform privacy attacks on counterfactuals by drawing on the attacks developed against synthetic data. More precisely, we investigate the effectiveness of the membership inference attacks designed for synthetic data on various types of counterfactuals. Additionally, while existing membership inference attacks against counterfactuals usually require to be able to query the model, we show how it is possible to perform successful membership inference attacks using only a set of counterfactuals, with no access to the model from which they are generated. Our results demonstrate that model developers should be more cautious when releasing counterfactuals to various users, as it can lead to a privacy breach.

Problem

Research questions and friction points this paper is trying to address.

counterfactuals

privacy

membership inference attacks

synthetic data

machine learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactuals

membership inference attacks

synthetic data