🤖 AI Summary
This work addresses the vulnerability of differentially private synthetic data in protecting anomalous individuals—such as patients with rare diseases—who face significantly higher success rates in membership inference attacks. To mitigate this risk, the authors propose a risk-balanced differentially private synthesis framework that first evaluates the anomaly level of each record using a small privacy budget and then inversely weights records by their risk during generative model training to attenuate the influence of high-risk samples. This approach provides stronger privacy guarantees by incorporating record-level risk awareness into differentially private synthesis for the first time, enabling targeted protection for highly anomalous individuals and yielding a closed-form per-record privacy bound. Experiments on both synthetic and real-world datasets (e.g., Breast Cancer, Adult) demonstrate a substantial reduction in membership inference success against high-anomaly records, with performance contingent on the synergy between the risk scorer and the synthesis pipeline.
📝 Abstract
When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's"outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so deliberately shrinking outliers'contributions yields tighter per-instance privacy bounds for precisely those records that need them most. We prove end-to-end DP guarantees via composition and derive closed-form per-record bounds for the synthesis stage (the scoring stage adds a uniform per-record term). Experiments on simulated data with controlled outlier injection show that risk-weighting substantially reduces membership inference success against high-outlierness records; ablations confirm that targeting -- not random downweighting -- drives the improvement. On real-world benchmarks (Breast Cancer, Adult, German Credit), gains are dataset-dependent, highlighting the interplay between scorer quality and synthesis pipeline.