TVineSynth: A Truncated C-Vine Copula Generator of Synthetic Tabular Data to Balance Privacy and Utility

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of balancing privacy preservation and data utility in synthetic tabular data generation. We propose TVineSynth, a privacy-enhanced generative framework based on truncated C-Vine copula structures. TVineSynth jointly optimizes vine tree truncation and directed bias injection to selectively eliminate privacy-leaking dependencies—such as those exploited by membership inference attacks (MIA) or attribute inference attacks (AIA)—while preserving statistical dependencies critical for downstream utility. Notably, it provides the first theoretical privacy guarantee against AIA under continuous sensitive attributes. Experiments on both synthetic and real-world datasets demonstrate that TVineSynth significantly outperforms state-of-the-art methods—with and without differential privacy—in resisting MIA and AIA, while improving average downstream prediction accuracy by 12.6%.

Technology Category

Application Category

📝 Abstract
We propose TVineSynth, a vine copula based synthetic tabular data generator, which is designed to balance privacy and utility, using the vine tree structure and its truncation to do the trade-off. Contrary to synthetic data generators that achieve DP by globally adding noise, TVineSynth performs a controlled approximation of the estimated data generating distribution, so that it does not suffer from poor utility of the resulting synthetic data for downstream prediction tasks. TVineSynth introduces a targeted bias into the vine copula model that, combined with the specific tree structure of the vine, causes the model to zero out privacy-leaking dependencies while relying on those that are beneficial for utility. Privacy is here measured with membership (MIA) and attribute inference attacks (AIA). Further, we theoretically justify how the construction of TVineSynth ensures AIA privacy under a natural privacy measure for continuous sensitive attributes. When compared to competitor models, with and without DP, on simulated and on real-world data, TVineSynth achieves a superior privacy-utility balance.
Problem

Research questions and friction points this paper is trying to address.

Balances privacy and utility in synthetic data generation
Uses vine copula model to control data approximation
Ensures privacy against membership and attribute inference attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vine copula for synthetic data generation
Balances privacy and utility via truncation
Targets bias to zero out privacy-leaking dependencies
🔎 Similar Papers
No similar papers found.
E
Elisabeth Griesbauer
Norwegian Computing Center, University of Oslo, Integreat - Norwegian Centre for Knowledge-driven Machine Learning
C
C. Czado
Technical University of Munich, Munich Data Science Institute
Arnoldo Frigessi
Arnoldo Frigessi
University of Oslo
StatisticsData ScienceMachine Learning
I
I. H. Haff
University of Oslo, Integreat - Norwegian Centre for Knowledge-driven Machine Learning