CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tabular synthetic data generators, while capable of matching observed marginal distributions, often fail to preserve the underlying causal structure, limiting their utility in causal inference and out-of-distribution tasks. This work proposes CausalWrap—a model-agnostic post-processing framework that injects partial causal priors (e.g., trusted or forbidden edges, monotonicity constraints) to differentiably refine the outputs of any pretrained generator, such as GANs, VAEs, or diffusion models. By integrating augmented Lagrangian optimization with differentiable mappings, CausalWrap enforces causal constraints while preserving the fidelity of the joint distribution, all without requiring access to the generator’s internal architecture. Experiments demonstrate that CausalWrap reduces average treatment effect (ATE) estimation error by 63% on the ACIC benchmark and improves ATE consistency from 0.00 to 0.38 on the MIMIC-IV dataset, all while maintaining original data utility.

Technology Category

Application Category

📝 Abstract
Tabular synthetic data generators are typically trained to match observational distributions, which can yield high conventional utility (e.g., column correlations, predictive accuracy) yet poor preservation of structural relations relevant to causal analysis and out-of-distribution (OOD) reasoning. When the downstream use of synthetic data involves causal reasoning -- estimating treatment effects, evaluating policies, or testing mediation pathways -- merely matching the observational distribution is insufficient: structural fidelity and treatment-mechanism preservation become essential. We propose CausalWrap (CW), a model-agnostic wrapper that injects partial causal knowledge (PCK) -- trusted edges, forbidden edges, and qualitative/monotonic constraints -- into any pretrained base generator (GAN, VAE, or diffusion model), without requiring access to its internals. CW learns a lightweight, differentiable post-hoc correction map applied to samples from the base generator, optimized with causal penalty terms under an augmented-Lagrangian schedule. We provide theoretical results connecting penalty-based optimization to constraint satisfaction and relating approximate factorization to joint distributional control. We validate CW on simulated structural causal models (SCMs) with known ground-truth interventions, semi-synthetic causal benchmarks (IHDP and an ACIC-style suite), and a real-world ICU cohort (MIMIC-IV) with expert-elicited partial graphs. CW improves causal fidelity across diverse base generators -- e.g., reducing average treatment effect (ATE) error by up to 63% on ACIC and lifting ATE agreement from 0.00 to 0.38 on the intensive care unit (ICU) cohort -- while largely retaining conventional utility.
Problem

Research questions and friction points this paper is trying to address.

causal fidelity
synthetic data
structural causal models
out-of-distribution reasoning
treatment effect estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

CausalWrap
causal constraints
model-agnostic
synthetic data
structural causal models
🔎 Similar Papers
No similar papers found.
A
Amir Asiaee
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
Z
Zhuohui J. Liang
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
Chao Yan
Chao Yan
Instructor at DBMI, VUMC; CS PhD from Vanderbilt U
AI for medicineSynthetic health dataPrivacyFairness