Dependency-aware synthetic tabular data generation

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing synthetic tabular data generation models struggle to preserve functional dependencies (FDs) and logical dependencies (LDs) among attributes, leading to structural distortions and limiting their applicability in privacy-sensitive domains such as healthcare. To address this, we propose the Hierarchical Feature Generation Framework (HFGF), a model-agnostic approach that explicitly encodes and incorporates FD/LD constraints without modifying underlying generative models (e.g., CTGAN, TVAE, GReaT). HFGF reconstructs data in stages—first generating independent features, then conditionally generating dependent ones—thereby enabling structure-aware synthesis. Experiments across four benchmark datasets demonstrate that HFGF significantly enhances the FD/LD preservation capability of six state-of-the-art models. The resulting synthetic data exhibit improved structural consistency and superior downstream performance in classification and regression tasks. This work establishes a new paradigm for high-fidelity, interpretable synthetic data generation.

Technology Category

Application Category

📝 Abstract
Synthetic tabular data is increasingly used in privacy-sensitive domains such as health care, but existing generative models often fail to preserve inter-attribute relationships. In particular, functional dependencies (FDs) and logical dependencies (LDs), which capture deterministic and rule-based associations between features, are rarely or often poorly retained in synthetic datasets. To address this research gap, we propose the Hierarchical Feature Generation Framework (HFGF) for synthetic tabular data generation. We created benchmark datasets with known dependencies to evaluate our proposed HFGF. The framework first generates independent features using any standard generative model, and then reconstructs dependent features based on predefined FD and LD rules. Our experiments on four benchmark datasets with varying sizes, feature imbalance, and dependency complexity demonstrate that HFGF improves the preservation of FDs and LDs across six generative models, including CTGAN, TVAE, and GReaT. Our findings demonstrate that HFGF can significantly enhance the structural fidelity and downstream utility of synthetic tabular data.
Problem

Research questions and friction points this paper is trying to address.

Preserving inter-attribute relationships in synthetic tabular data
Addressing poor retention of functional and logical dependencies
Enhancing structural fidelity of synthetic data for downstream utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Feature Generation Framework for tabular data
Reconstructs dependent features using FD and LD rules
Improves preservation of functional and logical dependencies
🔎 Similar Papers
No similar papers found.
C
Chaithra Umesh
Institute of Computer Science University of Rostock Germany
K
Kristian Schultz
Institute of Computer Science University of Rostock Germany
M
Manjunath Mahendra
Institute of Computer Science University of Rostock Germany
S
Saptarshi Bej
School of Data Science Indian Institute of Science Education and Research Thiruvananthapuram India
Olaf Wolkenhauer
Olaf Wolkenhauer
Professor for Systems Biology and Bioinformatics
Systems TheoryData Science