Spatiotemporal Multi-Task Graph Transformer for Trip-Level Transit Prediction

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the limitations of existing bus passenger flow prediction methods, which struggle to capture complex nonlinear spatiotemporal dependencies among stops and routes and are constrained by fixed-granularity modeling that overlooks intra-trip dynamics and network context. To overcome these challenges, the authors propose a trip-level sequence-to-sequence forecasting framework that innovatively integrates multi-relational graph embeddings, a Graph Transformer architecture, temporal–weather contextual encoding, and a multi-gated mixture-of-experts mechanism. A multi-task learning strategy jointly predicts boarding and alighting counts, leveraging auxiliary tasks—namely delay and dwell time prediction—to enhance representation learning. Experiments on real-world data from Trondheim, Norway, demonstrate that the proposed method substantially outperforms conventional tabular models, improving R² scores by 0.24 for both boarding and alighting predictions and consistently enhancing all evaluation metrics, thereby validating the efficacy and superiority of trip-level sequential modeling.

📝 Abstract

Passenger count data from public transit systems reveals urban mobility patterns and is essential for planning, operation, and optimisation. However, non-linear spatiotemporal interdependencies across stops and lines make modelling and prediction challenging. Existing approaches often rely on fixed temporal, spatial, or stop-level formulations, limiting their ability to capture within-trip evolution and network context. This study proposes SMT-GraphFormer, a spatiotemporal multi-task graph transformer that frames trip-level transit prediction as sequence-to-sequence modelling. Given a line's stop sequence and trip-level context, the model predicts successive boarding and alighting counts, with delay and dwell time treated as encoder-side surrogate tasks. Key components include graph embeddings for multi-relational stop similarity, a context encoder for weather and temporal information, and a multi-gate mixture-of-experts module that produces task-specific decoder representations for boarding and alighting predictions. Evaluation on public bus transit data from Trondheim, Norway, shows that SMT-GraphFormer outperforms stop-level tabular benchmarks, with ablation studies examining each component's contribution. The sequential formulation yields substantial gains on alighting prediction ($+$0.24 in $R^2$) and consistent improvements on boarding, delay, and dwell, confirming the value of explicit trip-level sequential bias and inter-target dependencies. These findings demonstrate the potential of transformer-based sequence modelling for capturing complex spatiotemporal dynamics in public transit and underscore the value of architectures tailored to transit data rather than off-the-shelf tabular models. The proposed framework provides a horizon-agnostic basis for scenario analysis in digital twin environments, supporting informed decision-making by planners and transit operators.

Problem

Research questions and friction points this paper is trying to address.

trip-level prediction

spatiotemporal dynamics

public transit

passenger flow

graph-based modelling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Transformer

Spatiotemporal Multi-Task Learning

Trip-Level Prediction