π€ AI Summary
This work addresses fairness concerns in differentially private synthetic data, which often arise from spurious associations between sensitive attributes and outcomes. To mitigate this issue, the authors propose PrivCI, a method that incorporates conditional independence (CI) constraints into the differentially private data synthesis process to eliminate such biases. The core innovation lies in a CI-aware greedy minimum spanning tree algorithm that integrates feasibility checks and the exponential mechanism during Kruskalβs construction, enabling graph structure learning that jointly preserves privacy, fairness, and data fidelity. Experimental results demonstrate that PrivCI significantly outperforms existing approaches in data fidelity and predictive accuracy on standard fairness benchmarks while strictly adhering to prescribed CI constraints.
π Abstract
Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via minimum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.