๐ค AI Summary
This work proposes a unified framework for generating multivariate discrete data with user-specified correlation structures and marginal distributions belonging to the generalized Poisson, negative binomial, or binomial familiesโthree classes that existing methods struggle to handle effectively within a single model. By integrating iterative conditional sampling, correlation correction, probability integral transformation, and discretization strategies, the proposed algorithm accurately matches both target marginal distributions and prescribed correlation matrices. Extensive experiments across four simulation scenarios and three real-world datasets demonstrate that the method efficiently produces synthetic multivariate discrete data conforming to the desired statistical properties. The approach is particularly well-suited for simulation and modeling tasks in fields such as biology, medicine, and social sciences, where realistic discrete multivariate data generation is essential.
๐ Abstract
The analysis of multivariate discrete data is crucial in various scientific research areas, such as epidemiology, the social sciences, genomics, and environmental studies. As the availability of such data increases, developing robust analytical and data generation tools is necessary to understand the relationships among variables. This paper builds upon previous work on data generation frameworks for multivariate ordinal data with a prespecified correlation matrix. The proposed algorithm generates multivariate discrete data from marginal distributions that follow the generalized Poisson, negative binomial, and binomial distributions. A step-by-step algorithm is provided, and its performance is illustrated in four simulated data scenarios and three real-data scenarios. This technique has the potential to be applied in a wide range of settings involving the generation of correlated discrete data.