SynQP: A Framework and Metrics for Evaluating the Quality and Privacy Risk of Synthetic Data

📅 2025-08-26

🏛️ Conference on Privacy, Security and Trust

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the lack of open frameworks and benchmarks for evaluating privacy risks in synthetic health data, which hinders its safe deployment. To this end, we propose SynQP, the first open-source framework enabling systematic benchmarking of both utility and privacy risks of synthetic data without access to real sensitive records, using simulated sensitive data instead. We introduce a more equitable metric for identity disclosure risk and conduct a comprehensive evaluation integrating differential privacy (DP), CTGAN generative models, membership inference attacks (MIA), and identity disclosure risk (IDR). Experimental results demonstrate that non-private models achieve near-perfect utility (≥0.97), while DP-enhanced models consistently reduce both identity disclosure and MIA risks below the regulatory threshold of 0.09.

Technology Category

Application Category

📝 Abstract

The use of synthetic data in health applications raises privacy concerns, yet the lack of open frameworks for privacy evaluations has slowed its adoption. A major challenge is the absence of accessible benchmark datasets for evaluating privacy risks, due to difficulties in acquiring sensitive data. To address this, we introduce SynQP, an open framework for benchmarking privacy in synthetic data generation (SDG) using simulated sensitive data, ensuring that original data remains confidential. We also highlight the need for privacy metrics that fairly account for the probabilistic nature of machine learning models. As a demonstration, we use SynQPto benchmark CTGAN and propose a new identity disclosure risk metric that offers a more accurate estimation of privacy risks compared to existing approaches. Our work provides a critical tool for improving the transparency and reliability of privacy evaluations, enabling safer use of synthetic data in health-related applications. Our privacy assessments (Table II) reveal that DP consistently lowers both identity disclosure risk (SD-IDR) and membershipinference attack risk (SD-MIA), with all DP-augmented models staying below the 0.09 regulatory threshold.Code available at https://github.com/CAN-SYNH/SynQP

Problem

Research questions and friction points this paper is trying to address.

synthetic data

privacy risk

benchmarking

health applications

privacy metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic Data

Privacy Risk Evaluation

Benchmarking Framework