Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluations of synthetic data privacy lack quantifiable, comparable metrics due to ambiguous privacy definitions and existing measures’ inability to reflect real-world disclosure risks. Method: We propose the first benchmark framework based on deliberate risk insertion—integrating legal theory with a black-box threat model—to enable reproducible, cross-method assessment of privacy-utility trade-offs. Our approach systematically controls perturbations, models diverse black-box attacks, maps outputs to regulatory compliance criteria, and validates findings on public datasets. Contribution/Results: Empirical evaluation reveals substantial discrepancies between mainstream privacy metrics (e.g., k-anonymity, differential privacy estimates) and actual re-identification risks under realistic attack scenarios. This work establishes the first evaluation paradigm for privacy-enhancing technologies (PETs) that is simultaneously interpretable, empirically grounded, and aligned with regulatory requirements—thereby bridging theoretical guarantees, practical security, and legal accountability.

Technology Category

Application Category

📝 Abstract
Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the related legal theory. We then apply the framework to the main privacy quantification methods with no-box threat models on publicly available datasets.
Problem

Research questions and friction points this paper is trying to address.

Evaluating synthetic data privacy protection efficacy
Assessing privacy quantification methods through risk insertion
Benchmarking tabular synthetic data privacy metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for empirical privacy assessment
Controlled risk insertion in synthetic data
Evaluation of no-box threat models
🔎 Similar Papers
No similar papers found.
M
Milton Nicolás Plasencia Palacios
Aindo SpA, Trieste, Italy
A
Alexander Boudewijn
Aindo SpA, Trieste, Italy
S
Sebastiano Saccani
Aindo SpA, Trieste, Italy
A
Andrea Filippo Ferraris
LAST-JD, Alma AI, Alma Mater Studiorum, University of Bologna & DIKE research group, PREC department, Vrije Universiteit Brussel
D
Diana Sofronieva
Aindo SpA, Trieste, Italy
G
Giuseppe D'Acquisto
Luiss University, Rome, Italy
F
Filiberto Brozzetti
Luiss University, Rome, Italy
D
Daniele Panfilo
Aindo SpA, Trieste, Italy
Luca Bortolussi
Luca Bortolussi
Università di Trieste
modelling and simulationexplainable artificial intelligencemachine learningformal methodscyber-physical systems