🤖 AI Summary
Real-world time-series generation faces critical bottlenecks, including causal structure distortion and inadequate modeling of dynamic lags. Method: We propose a three-stage causal simulation framework: (i) lag-aware causal graph estimation, (ii) nonlinear functional dependency approximation via neural ODEs or MLPs, and (iii) joint noise distribution modeling using a VAE-GAN hybrid. Our approach introduces the first model-agnostic, customizable time-series causal generation pipeline, augmented by a min-max AutoML-driven adversarial discriminative optimization mechanism and a multi-dimensional evaluation paradigm balancing fidelity and verifiability. Contribution/Results: Experiments on real, semi-synthetic, and synthetic datasets demonstrate substantial improvements in causal fidelity and temporal plausibility of generated sequences. The method achieves superior generalization in downstream causal inference and intervention modeling tasks, outperforming existing baselines in both qualitative and quantitative assessments.
📝 Abstract
Causal Discovery plays a pivotal role in revealing relationships among observed variables, particularly in the temporal setup. While the majority of CD methods rely on synthetic data for evaluation, and recently for training, these fall short in accurately mirroring real-world scenarios; an effect even more evident in temporal data. Generation techniques depending on simplified assumptions on causal structure, effects and time, limit the quality and diversity of the simulated data. In this work, we introduce Temporal Causal-based Simulation (TCS), a robust framework for generating realistic time-series data and their associated temporal causal graphs. The approach is structured in three phases: estimating the true lagged causal structure of the data, approximating the functional dependencies between variables and learning the noise distribution of the corresponding causal model, each part of which can be explicitly tailored based on data assumptions and characteristics. Through an extensive evaluation process, we highlight that single detection methods for generated data discrimination prove inadequate, accentuating it as a multifaceted challenge. For this, we detail a Min-max optimization phase that draws on AutoML techniques. Our contributions include a flexible, model-agnostic pipeline for generating realistic temporal causal data, a thorough evaluation setup which enhances the validity of the generated datasets and insights into the challenges posed by realistic data generation. Through experiments involving not only real but also semi-synthetic and purely synthetic datasets, we demonstrate that while sampling realistic causal data remains a complex task, our method enriches the domain of generating sensible causal-based temporal data.