🤖 AI Summary
This study addresses the limitations of traditional Monte Carlo simulations, which rely on ad hoc assumptions and struggle to generate data reflecting realistic multilevel structures, thereby compromising the validity of quantitative method evaluations. To overcome this, the authors propose the first six-stage workflow integrating generative AI with multilevel data simulation. They innovatively adapt diffusion models and generative adversarial networks (GANs) to accommodate hierarchical data structures and introduce a comprehensive synthetic data quality assessment framework that ensures both within-table and cross-table consistency. Empirical experiments on real-world social science datasets demonstrate that the proposed approach substantially enhances the realism and reliability of Monte Carlo simulations, outperforming conventional strategies and providing a more empirically grounded benchmark for evaluating predictive performance and parameter recovery in quantitative methods.
📝 Abstract
The role of AI-generated synthetic data has recently been expanded to support realistic Monte Carlo simulations. However, guidance is limited on generating data with multilevel structures and designing simulations based on such data. This study proposes a general framework for AI-based simulation studies to evaluate the predictive performance and parameter recovery of quantitative methods, specifically using multilevel data commonly observed in the social sciences. Our proposed six-stage workflow consists of (i) specifying a method and real data, (ii) training Generative AI with real data, (iii) assessing synthetic data quality, (iv) designing and conducting simulations, (v) evaluating method performance, and (vi) checking robustness. To enhance fidelity in multilevel data generation, we also introduce targeted modifications to diffusion models and Generative Adversarial Networks (GANs). Furthermore, we develop a systematic quality evaluation framework that assesses both within-table and between-table fidelity, and discuss how AI-based simulation designs should differ depending on whether the simulation's objective is predictive performance or parameter recovery. Finally, using empirical multilevel data and multilevel modeling methods, we demonstrate the utility of the proposed AI-based simulation framework. This approach leads to more accurate and honest evaluations of quantitative methods in the real world, unlike traditional simulation studies based on arbitrary simulated scenarios.