๐ค AI Summary
Addressing the challenges of extremely low sensor coverage (down to 1%), highly zero-inflated traffic flows, heterogeneous road network structures, and ambiguous spatial dependencies in urban transportation, this paper proposes a spatial interpolation method for sparse multimodal traffic flows (cycling and taxi). The method integrates adaptive graph-structured modeling with spatiotemporal graph neural networks. Key contributions include: (1) the first negative binomial loss function explicitly designed for zero-inflated count distributions; (2) functional-role-aware node feature encoding coupled with a masked interpolation training strategy; and (3) the first open-source dual-city multimodal benchmarkโBerlin Strava and NYC Taxi. Under severe coverage reduction (90% โ 1%), the approach demonstrates strong robustness (Strava MAE: 7.1 โ 10.5; Taxi MAE: 23.0 โ 40.4), consistently outperforming state-of-the-art methods across MAE, RMSE, true-zero rate, and KL divergence.
๐ Abstract
Reliable street-level traffic volume data, covering multiple modes of transportation, helps urban planning by informing decisions on infrastructure improvements, traffic management, and public transportation. Yet, traffic sensors measuring traffic volume are typically scarcely located, due to their high deployment and maintenance costs. To address this, interpolation methods can estimate traffic volumes at unobserved locations using available data. Graph Neural Networks have shown strong performance in traffic volume forecasting, particularly on highways and major arterial networks. Applying them to urban settings, however, presents unique challenges: urban networks exhibit greater structural diversity, traffic volumes are highly overdispersed with many zeros, the best way to account for spatial dependencies remains unclear, and sensor coverage is often very sparse. We introduce the Graph Neural Network for Urban Interpolation (GNNUI), a novel urban traffic volume estimation approach. GNNUI employs a masking algorithm to learn interpolation, integrates node features to capture functional roles, and uses a loss function tailored to zero-inflated traffic distributions. In addition to the model, we introduce two new open, large-scale urban traffic volume benchmarks, covering different transportation modes: Strava cycling data from Berlin and New York City taxi data. GNNUI outperforms recent, some graph-based, interpolation methods across metrics (MAE, RMSE, true-zero rate, Kullback-Leibler divergence) and remains robust from 90% to 1% sensor coverage. On Strava, for instance, MAE rises only from 7.1 to 10.5, on Taxi from 23.0 to 40.4, demonstrating strong performance under extreme data scarcity, common in real-world urban settings. We also examine how graph connectivity choices influence model accuracy.