Extracting Insights from Large-Scale Telematics Data for ITS Applications: Lessons and Recommendations

πŸ“… 2025-07-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of large-scale, highly heterogeneous vehicular data and the lack of cross-manufacturer standardization in connected vehicle environments, this study develops a scalable big data processing framework integrating distributed computing, multi-source data cleaning, and semantic standardization techniques. The framework efficiently processes 120 million trips (covering 140 million miles) from Virginia’s 2021–2022 connected vehicle dataset, spanning vehicles across multiple OEMs. We release, for the first time, open-source, trip-level and segment-level aggregated datasets, alongside an interactive visualization platform that systematically uncovers spatiotemporal patterns in travel behavior and road-speed evolution. Our contributions include: (1) proposing an end-to-end processing paradigm tailored to real-world onboard sensor data; (2) delivering a systematic solution to data heterogeneity and scalability challenges; and (3) providing a best-practice blueprint for deploying intelligent transportation applications grounded in empirical, production-grade data infrastructure.

Technology Category

Application Category

πŸ“ Abstract
Over 90% of new vehicles in the United States now collect and transmit telematics data. Similar trends are seen in other developed countries. Transportation planners have previously utilized telematics data in various forms, but its current scale offers significant new opportunities in traffic measurement, classification, planning, and control. Despite these opportunities, the enormous volume of data and lack of standardization across manufacturers necessitates a clearer understanding of the data and improved data processing methods for extracting actionable insights. This paper takes a step towards addressing these needs through four primary objectives. First, a data processing pipeline was built to efficiently analyze 1.4 billion miles (120 million trips) of telematics data collected in Virginia between August 2021 and August 2022. Second, an open data repository of trip and roadway segment level summaries was created. Third, interactive visualization tools were designed to extract insights from these data about trip-taking behavior and the speed profiles of roadways. Finally, major challenges that were faced during processing this data are summarized and recommendations to overcome them are provided. This work will help manufacturers collecting the data and transportation professionals using the data to develop a better understanding of the possibilities and major pitfalls to avoid.
Problem

Research questions and friction points this paper is trying to address.

Processing large-scale telematics data for traffic insights
Standardizing and extracting actionable insights from diverse data sources
Developing tools to analyze trip behavior and roadway speeds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Built pipeline for 1.4B miles telematics analysis
Created open repository for trip summaries
Designed interactive visualization tools for insights
πŸ”Ž Similar Papers
No similar papers found.
Gibran Ali
Gibran Ali
Research Scientist, Virginia Tech Transportation Institue
TransportationControl SystemsEnergy HarvestingData Analysis
N
Neal Feierabend
Virginia Tech Transportation Institute
P
Prarthana Doshi
Virginia Tech Transportation Institute
C
Calvin Winkowski
Virginia Tech Transportation Institute
Michael Fontaine
Michael Fontaine
Virginia Department of Transportation