RIDE: An Open Dataset and Benchmark for Train Delay Prediction

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Progress in train delay prediction has been hindered by the absence of standardized datasets, task formulations, and evaluation protocols. This work proposes RIDE—the first open benchmark encompassing the entire Belgian railway network—integrating 94.5 million train events, 3.6 million journeys, and 35.7 million weather records from 2023 to 2025. RIDE establishes a hierarchical pipeline that transforms raw data into model-ready inputs and unifies prediction tasks with consistent evaluation metrics. It enables fair, multi-model comparisons and fine-grained performance analysis through relational modeling, graph neural networks, statistical methods, and deep learning approaches. Experimental results demonstrate that learning-based methods substantially outperform non-learning baselines, with graph neural networks achieving overall superior performance, as quantified by MAE, RMSE, and multidimensional performance breakdowns.

📝 Abstract

Train delay prediction is an important problem for both passengers and railway operators, yet progress in the field remains difficult to assess due to the lack of standardized datasets, prediction targets, and evaluation protocols. To address this gap, we introduce RIDE, an open dataset and benchmark for train delay prediction built at nationwide scale over the Belgian railway network. RIDE covers 94.5M train events, 3.6M journeys, and 35.7M weather records from 2023 to 2025. It is organized as a layered data pipeline from raw railway and weather sources to two public releases: a reusable intermediate relational dataset and model-ready benchmark datasets. The benchmark standardizes the prediction task and the training and testing data. It also provides a unified evaluation protocol that supports direct comparison across models. Using this framework, we provide the first comprehensive comparative evaluation of non-learning, statistical learning, and deep learning models. We show that learning-based methods clearly outperform non-learning models, with graph neural networks achieving the best mean performance, while the strongest learning-based models remain relatively close to one another. Beyond aggregate mean absolute error (MAE) and root mean squared error (RMSE), the framework also provides breakdowns by prediction horizon and delay change, enabling more detailed analysis of model behavior across forecasting regimes.

Problem

Research questions and friction points this paper is trying to address.

train delay prediction

standardized dataset

evaluation protocol

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

train delay prediction

open dataset

benchmark

graph neural networks