🤖 AI Summary
Existing mobile trajectory datasets suffer from low spatial resolution and insufficient semantic information, hindering fine-grained urban planning and traffic management. To address this, we propose the first privacy-preserving, semantically rich synthetic human mobility dataset, integrating heterogeneous multi-source data—including geographic, sociodemographic, transportation, and mobility features. We introduce a novel cross-domain data fusion framework coupled with an adversarial domain adaptation algorithm to achieve semantic alignment and generalizable transfer across multi-resolution and multimodal data. Leveraging graph neural network–based representation learning and agent-based simulation enhanced by differential privacy, our approach accurately reproduces real-world mobility patterns in case studies of the Los Angeles I-405 corridor and Egypt. It achieves prediction errors of only 5.85% for traffic volume and 4.36% for speed—substantially outperforming baseline methods.
📝 Abstract
Human mobility modeling is critical for urban planning and transportation management, yet existing datasets often lack the resolution and semantic richness required for comprehensive analysis. To address this, we proposed a cross-domain data fusion framework that integrates multi-modal data of distinct nature and spatio-temporal resolution, including geographical, mobility, socio-demographic, and traffic information, to construct a privacy-preserving and semantically enriched human travel trajectory dataset. This framework is demonstrated through two case studies in Los Angeles (LA) and Egypt, where a domain adaptation algorithm ensures its transferability across diverse urban contexts. Quantitative evaluation shows that the generated synthetic dataset accurately reproduces mobility patterns observed in empirical data. Moreover, large-scale traffic simulations for LA County based on the generated synthetic demand align well with observed traffic. On California's I-405 corridor, the simulation yields a Mean Absolute Percentage Error of 5.85% for traffic volume and 4.36% for speed compared to Caltrans PeMS observations.