ADFormer: Aggregation Differential Transformer for Passenger Demand Forecasting

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing attention-based demand forecasting models struggle to capture complex spatiotemporal dependencies, particularly higher-order semantic correlations—such as periodicity and functional-zone associations. To address this, we propose DiffTransformer: (1) a differential attention mechanism that explicitly models spatial heterogeneity while suppressing noise; (2) a spatiotemporal heterogeneous aggregation module that jointly integrates raw proximity dependencies and multi-order semantic correlations; and (3) spatiotemporal decoupled encoding coupled with multi-scale feature alignment. Evaluated on taxi and shared-bike datasets, DiffTransformer achieves an average 8.2% improvement in prediction accuracy over state-of-the-art methods, alongside a 15% speedup in inference latency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Passenger demand forecasting helps optimize vehicle scheduling, thereby improving urban efficiency. Recently, attention-based methods have been used to adequately capture the dynamic nature of spatio-temporal data. However, existing methods that rely on heuristic masking strategies cannot fully adapt to the complex spatio-temporal correlations, hindering the model from focusing on the right context. These works also overlook the high-level correlations that exist in the real world. Effectively integrating these high-level correlations with the original correlations is crucial. To fill this gap, we propose the Aggregation Differential Transformer (ADFormer), which offers new insights to demand forecasting promotion. Specifically, we utilize Differential Attention to capture the original spatial correlations and achieve attention denoising. Meanwhile, we design distinct aggregation strategies based on the nature of space and time. Then, the original correlations are unified with the high-level correlations, enabling the model to capture holistic spatio-temporal relations. Experiments conducted on taxi and bike datasets confirm the effectiveness and efficiency of our model, demonstrating its practical value. The code is available at https://github.com/decisionintelligence/ADFormer.
Problem

Research questions and friction points this paper is trying to address.

Existing methods fail to adapt to complex spatio-temporal correlations
Current approaches overlook high-level real-world correlations
Need to integrate high-level and original correlations effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Attention for spatial correlation denoising
Distinct aggregation strategies for space and time
Unified original and high-level spatio-temporal correlations
🔎 Similar Papers
No similar papers found.
H
Haichen Wang
East China Normal University
L
Liu Yang
East China Normal University
X
Xinyuan Zhang
East China Normal University
Haomin Yu
Haomin Yu
University of Salford
data miningspatio-temporal miningmulti-task learning
M
Ming Li
INSPUR Co.,Ltd
Jilin Hu
Jilin Hu
Professor, East China Normal University
Spatial-Temporal DataMachine LearningTransportation