🤖 AI Summary
High-quality individual mobility data are scarce, hindering urban mobility modeling and evidence-based policy research. To address this, we introduce and publicly release the first GDPR-compliant, high-precision, multimodal mobility dataset covering one week of movement for 3,337 residents across the Greater Paris region. The dataset comprises 500 million raw GNSS trajectory points, over 80,000 manually annotated trips, and rich demographic attributes. Methodologically, we integrate GNSS tracking, algorithmic trip inference, dual human validation (via diaries and telephone interviews), statistical weighting for representativeness, and differential privacy–enhanced anonymization. The resulting data are structured into three interoperable databases—trajectory, trip, and individual—enabling scalable, population-level mobility analysis. This dataset is openly available for academic applications and establishes a new benchmark for urban computing and sustainable transportation research in Europe.
📝 Abstract
High-quality mobility data remains scarce despite growing interest from researchers and urban stakeholders in understanding individual-level movement patterns. The Netmob25 Data Challenge addresses this gap by releasing a unique GPS-based mobility dataset derived from the EMG 2023 GNSS-based mobility survey conducted in the Ile-de-France region (Greater Paris area), France. This dataset captures detailed daily mobility over a full week for 3,337 volunteer residents aged 16 to 80, collected between October 2022 and May 2023. Each participant was equipped with a dedicated GPS tracking device configured to record location points every 2-3 seconds and was asked to maintain a digital or paper logbook of their trips. All inferred mobility traces were algorithmically processed and validated through follow-up phone interviews. The dataset includes three components: (i) an Individuals database describing demographic, socioeconomic, and household characteristics; (ii) a Trips database with over 80,000 annotated displacements including timestamps, transport modes, and trip purposes; and (iii) a Raw GPS Traces database comprising about 500 million high-frequency points. A statistical weighting mechanism is provided to support population-level estimates. An extensive anonymization pipeline was applied to the GPS traces to ensure GDPR compliance while preserving analytical value. Access to the dataset requires acceptance of the challenge's Terms and Conditions and signing a Non-Disclosure Agreement. This paper describes the survey design, collection protocol, processing methodology, and characteristics of the released dataset.