PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a task-oriented differentially private synthetic data generation framework that addresses the limitations of existing methods, which typically allocate privacy budgets uniformly and ignore structural information relevant to downstream prediction tasks—leading to significant performance degradation under distribution shift. The proposed approach introduces, for the first time, a structure-aware privacy budget allocation mechanism that dynamically selects critical features and optimizes budget distribution based on three types of prior knowledge: causal relationships, graphical models, or purely predictive signals. By integrating causal inference, Bayesian networks, and differentially private feature selection, the method enables end-to-end privacy preservation with controlled error. Experimental results demonstrate its effectiveness and robustness, achieving an AUC of 0.73 under distribution shift—substantially outperforming correlation-based baselines (approximately 0.49).

Technology Category

Application Category

📝 Abstract
Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically requires adding noise, and noise can accumulate when many statistics are measured. Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task. We develop a prediction-centric approach operating in three regimes depending on available structural knowledge. In the causal regime, when the causal parents of $Y$ are known and distribution shift is expected, we target the parents for robustness. In the graphical regime, when a Bayesian network structure is available and the distribution is stable, the Markov blanket of $Y$ provides a sufficient feature set for optimal prediction. In the predictive regime, when no structural knowledge exists, we select features via differentially private methods without claiming to recover causal or graphical structure. We formalize this as PRISM, a mechanism that (i) identifies a predictive feature subset according to the appropriate regime, (ii) constructs targeted summary statistics, (iii) allocates budget to minimize an upper bound on prediction error, and (iv) synthesizes data via graphical-model inference. We prove end-to-end privacy guarantees and risk bounds. Empirically, task-aware allocation improves prediction accuracy compared to generic synthesizers. Under distribution shift, targeting causal parents achieves AUC $\approx 0.73$ while correlation-based selection collapses to chance ($\approx 0.49$).
Problem

Research questions and friction points this paper is trying to address.

differential privacy
synthetic data
prediction
budget allocation
distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

differentially private synthetic data
structure-aware budget allocation
prediction-centric privacy
causal feature selection
Markov blanket
🔎 Similar Papers
No similar papers found.
A
Amir Asiaee
Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN 37203, USA
Chao Yan
Chao Yan
Instructor at DBMI, VUMC; CS PhD from Vanderbilt U
AI for medicineSynthetic health dataPrivacyFairness
Z
Zachary B. Abrams
Institute for Informatics, Washington University, 4444 Forest Park Avenue, St. Louis, MO 63108, USA
B
Bradley A. Malin
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA