🤖 AI Summary
Traffic data imputation faces three major challenges: complex missingness patterns, inconsistent benchmarks, and overly narrow evaluation criteria. To address these, this work introduces the first spatiotemporal joint missingness pattern taxonomy and establishes a standardized evaluation framework encompassing ten representative model families—including graph neural networks (GNNs), recurrent neural networks (RNNs), Transformers, and matrix factorization methods. We conduct systematic assessments across synthetic and real-world traffic datasets, measuring effectiveness, computational efficiency, and robustness. Through controlled experiments varying both missing rates and patterns, we empirically characterize the performance boundaries of each model class under diverse missingness scenarios—marking the first such comprehensive analysis. Our findings yield a reproducible, comparable model selection guideline for traffic imputation and advance the field toward standardization and practical deployment.
📝 Abstract
Traffic data imputation is a critical preprocessing step in intelligent transportation systems, enabling advanced transportation services. Despite significant advancements in this field, selecting the most suitable model for practical applications remains challenging due to three key issues: 1) incomprehensive consideration of missing patterns that describe how data loss along spatial and temporal dimensions, 2) the lack of test on standardized datasets, and 3) insufficient evaluations. To this end, we first propose practice-oriented taxonomies for missing patterns and imputation models, systematically identifying all possible forms of real-world traffic data loss and analyzing the characteristics of existing models. Furthermore, we introduce a unified benchmarking pipeline to comprehensively evaluate 10 representative models across various missing patterns and rates. This work aims to provide a holistic understanding of traffic data imputation research and serve as a practical guideline.