🤖 AI Summary
This study addresses the lack of open frameworks and benchmarks for evaluating privacy risks in synthetic health data, which hinders its safe deployment. To this end, we propose SynQP, the first open-source framework enabling systematic benchmarking of both utility and privacy risks of synthetic data without access to real sensitive records, using simulated sensitive data instead. We introduce a more equitable metric for identity disclosure risk and conduct a comprehensive evaluation integrating differential privacy (DP), CTGAN generative models, membership inference attacks (MIA), and identity disclosure risk (IDR). Experimental results demonstrate that non-private models achieve near-perfect utility (≥0.97), while DP-enhanced models consistently reduce both identity disclosure and MIA risks below the regulatory threshold of 0.09.
📝 Abstract
The use of synthetic data in health applications raises privacy concerns, yet the lack of open frameworks for privacy evaluations has slowed its adoption. A major challenge is the absence of accessible benchmark datasets for evaluating privacy risks, due to difficulties in acquiring sensitive data. To address this, we introduce SynQP, an open framework for benchmarking privacy in synthetic data generation (SDG) using simulated sensitive data, ensuring that original data remains confidential. We also highlight the need for privacy metrics that fairly account for the probabilistic nature of machine learning models. As a demonstration, we use SynQPto benchmark CTGAN and propose a new identity disclosure risk metric that offers a more accurate estimation of privacy risks compared to existing approaches. Our work provides a critical tool for improving the transparency and reliability of privacy evaluations, enabling safer use of synthetic data in health-related applications. Our privacy assessments (Table II) reveal that DP consistently lowers both identity disclosure risk (SD-IDR) and membershipinference attack risk (SD-MIA), with all DP-augmented models staying below the 0.09 regulatory threshold.Code available at https://github.com/CAN-SYNH/SynQP