Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of differentially private synthetic data in protecting anomalous individuals—such as patients with rare diseases—who face significantly higher success rates in membership inference attacks. To mitigate this risk, the authors propose a risk-balanced differentially private synthesis framework that first evaluates the anomaly level of each record using a small privacy budget and then inversely weights records by their risk during generative model training to attenuate the influence of high-risk samples. This approach provides stronger privacy guarantees by incorporating record-level risk awareness into differentially private synthesis for the first time, enabling targeted protection for highly anomalous individuals and yielding a closed-form per-record privacy bound. Experiments on both synthetic and real-world datasets (e.g., Breast Cancer, Adult) demonstrate a substantial reduction in membership inference success against high-anomaly records, with performance contingent on the synergy between the risk scorer and the synthesis pipeline.

Technology Category

Application Category

📝 Abstract
When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's"outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so deliberately shrinking outliers'contributions yields tighter per-instance privacy bounds for precisely those records that need them most. We prove end-to-end DP guarantees via composition and derive closed-form per-record bounds for the synthesis stage (the scoring stage adds a uniform per-record term). Experiments on simulated data with controlled outlier injection show that risk-weighting substantially reduces membership inference success against high-outlierness records; ablations confirm that targeting -- not random downweighting -- drives the improvement. On real-world benchmarks (Breast Cancer, Adult, German Credit), gains are dataset-dependent, highlighting the interplay between scorer quality and synthesis pipeline.
Problem

Research questions and friction points this paper is trying to address.

differential privacy
synthetic data
outliers
membership inference
privacy risk
Innovation

Methods, ideas, or system contributions that make the work stand out.

risk-equalized differential privacy
synthetic data
outlier protection
record-level influence
membership inference
🔎 Similar Papers
No similar papers found.
A
Amir Asiaee
Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN 37203, USA
Chao Yan
Chao Yan
Instructor at DBMI, VUMC; CS PhD from Vanderbilt U
AI for medicineSynthetic health dataPrivacyFairness
Z
Zachary B. Abrams
Institute for Informatics, Washington University, 4444 Forest Park Avenue, St. Louis, MO 63108, USA
B
Bradley A. Malin
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA