🤖 AI Summary
Real-world human mobility data are often hindered by privacy concerns, noise, and trajectory incompleteness, limiting their utility in transportation and urban computing research. To address these challenges, this work presents a high-fidelity, large-scale synthetic mobility dataset generated through agent-based simulation. Integrating OpenStreetMap road networks, GTFS schedules from over 40 transit agencies, and demographic statistics, the framework synthesizes multimodal trajectories—including public transit, rail, driving, cycling, and walking—for 500,000 virtual agents across the San Francisco Bay Area over 70 days at a 1 Hz temporal resolution. Each trajectory is annotated with semantic activity labels and demographic attributes. The resulting dataset comprises 3.024 trillion noise-free, complete trajectory records, uniquely achieving fine-grained detail, full observability, and strong privacy guarantees, thereby offering a high-quality benchmark resource for the research community.
📝 Abstract
We introduce SF-LIFE, a large-scale simulated movement dataset designed to accelerate research in transportation, mobility, and machine learning. The dataset contains 3,024,000,000,000 location records capturing complete, noise-free, multi-modality trajectories of 500,000 simulated agents observed at a 1Hz frequency navigating the San Francisco Bay Area network over a 70-day period. The data captures (1) needs-driven daily agendas of individual agents generated by an agent-based simulation of human patterns of life and (2) detailed kinematic trajectories moving agents across the OpenStreetMap representation of San Francisco using data from 40+ transit agencies across 9 counties. SF-LIFE provides unprecedented scale and detail as trajectories are based on real transit infrastructure using San Francisco General Transit Feed Specification (GTFS) data, having agent movements across multiple modalities, including bus, rail, bike, automobile, and walking. For this high-fidelity simulated representation of San Francisco, we provide (1) the full trajectory data annotated with transportation mode labels, (2) reduced-size versions of the trajectory data with reduced temporal frequency, (3) agent activity information describing the causal activity why an agent visits a place, (4) agent demographic data, and (5) the underlying OSM road network and building data. As the first dataset of its scale and level of detail, SF-LIFE overcomes the privacy, noise, and completeness limitations inherent in real-world tracking data, providing a robust and ethically sourced resource for research in transit optimization, human mobility analysis, and urban computing.