Privacy Auditing Synthetic Data Release through Local Likelihood Attacks

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Privacy leakage auditing in synthetic data publication lacks reliable, general-purpose evaluation methods. This paper proposes Gen-LRA, a novel no-box membership inference attack framework that requires neither model knowledge nor prior assumptions. Gen-LRA is the first to exploit local overfitting of generative models near the training distribution: it constructs surrogate models and quantifies how test samples perturb the local likelihood ratio to precisely detect exposure risk of training data. By circumventing the limitations of heuristic-based attacks, Gen-LRA is systematically validated across diverse datasets and model architectures. Benchmark experiments demonstrate that Gen-LRA consistently outperforms existing membership inference attacks in success rate, robustness, and generalizability. It establishes a reproducible, interpretable auditing paradigm for quantifying privacy risks in synthetic data generation.

Technology Category

Application Category

📝 Abstract
Auditing the privacy leakage of synthetic data is an important but unresolved problem. Most existing privacy auditing frameworks for synthetic data rely on heuristics and unreasonable assumptions to attack the failure modes of generative models, exhibiting limited capability to describe and detect the privacy exposure of training data through synthetic data release. In this paper, we study designing Membership Inference Attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution. Here, we propose Generative Likelihood Ratio Attack (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has in a surrogate model's estimation of a local likelihood ratio over the synthetic data. Assessed over a comprehensive benchmark spanning diverse datasets, model architectures, and attack parameters, we find that Gen-LRA consistently dominates other MIAs for generative models across multiple performance metrics. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, highlighting the significant privacy risks posed by generative model overfitting in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Auditing privacy leakage in synthetic data release
Detecting training data exposure via synthetic outputs
Addressing generative model overfitting privacy risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

No-Box MIA with local likelihood ratio evaluation
Computationally efficient privacy auditing without model access
Exploits generative model overfitting through surrogate estimation
🔎 Similar Papers
No similar papers found.
J
Joshua Ward
Department of Statistics, University of California Los Angeles
Chi-Hua Wang
Chi-Hua Wang
Department of Supply Chain and Operations Management, Purdue University
Dynamic PricingBandit AlgorithmsSynthetic Data GenerationDifferential Privacy
G
Guang Cheng
Department of Statistics, University of California Los Angeles