Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models

๐Ÿ“… 2026-02-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the limitations of existing synthetic population generation methods, which often rely on a single data source or sequential fusion strategies, thereby failing to adequately model complex inter-variable dependencies and insufficiently handling both sampling zeros and structural zerosโ€”leading to reduced diversity and feasibility in generated populations. To overcome these challenges, this work proposes a joint end-to-end synthetic framework based on Wasserstein Generative Adversarial Networks (WGAN) that enables simultaneous integration of multi-source demographic data for the first time. Furthermore, an inverse gradient penalty regularizer is introduced to explicitly promote the generation of feasible and diverse attribute combinations. Experimental results demonstrate that the joint approach improves recall by 7% and precision by 15% over sequential baselines; incorporating the proposed regularizer yields an additional 10% gain in recall and a 1% increase in precision. The method achieves a comprehensive five-dimensional similarity score of 88.1, outperforming the baseline score of 84.6.

Technology Category

Application Category

๐Ÿ“ Abstract
Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
Problem

Research questions and friction points this paper is trying to address.

synthetic population
data fusion
sampling zeros
structural zeros
diversity and feasibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wasserstein GAN
joint population synthesis
multi-source data fusion
inverse gradient penalty
synthetic population generation
๐Ÿ”Ž Similar Papers
No similar papers found.