π€ AI Summary
To address the challenges of large-scale, highly heterogeneous vehicular data and the lack of cross-manufacturer standardization in connected vehicle environments, this study develops a scalable big data processing framework integrating distributed computing, multi-source data cleaning, and semantic standardization techniques. The framework efficiently processes 120 million trips (covering 140 million miles) from Virginiaβs 2021β2022 connected vehicle dataset, spanning vehicles across multiple OEMs. We release, for the first time, open-source, trip-level and segment-level aggregated datasets, alongside an interactive visualization platform that systematically uncovers spatiotemporal patterns in travel behavior and road-speed evolution. Our contributions include: (1) proposing an end-to-end processing paradigm tailored to real-world onboard sensor data; (2) delivering a systematic solution to data heterogeneity and scalability challenges; and (3) providing a best-practice blueprint for deploying intelligent transportation applications grounded in empirical, production-grade data infrastructure.
π Abstract
Over 90% of new vehicles in the United States now collect and transmit telematics data. Similar trends are seen in other developed countries. Transportation planners have previously utilized telematics data in various forms, but its current scale offers significant new opportunities in traffic measurement, classification, planning, and control. Despite these opportunities, the enormous volume of data and lack of standardization across manufacturers necessitates a clearer understanding of the data and improved data processing methods for extracting actionable insights.
This paper takes a step towards addressing these needs through four primary objectives. First, a data processing pipeline was built to efficiently analyze 1.4 billion miles (120 million trips) of telematics data collected in Virginia between August 2021 and August 2022. Second, an open data repository of trip and roadway segment level summaries was created. Third, interactive visualization tools were designed to extract insights from these data about trip-taking behavior and the speed profiles of roadways. Finally, major challenges that were faced during processing this data are summarized and recommendations to overcome them are provided. This work will help manufacturers collecting the data and transportation professionals using the data to develop a better understanding of the possibilities and major pitfalls to avoid.