Hidden Leaks in Time Series Forecasting: How Data Leakage Affects LSTM Evaluation Across Configurations and Validation Strategies

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the bias induced by data leakage in LSTM-based time-series forecasting evaluation. To quantify leakage effects—particularly future information contamination inherent in common validation strategies—we propose the RMSE Gain metric. We empirically compare two-way and three-way data splits against 10-fold cross-validation (CV) under pre-split and leakage-avoiding sequence construction protocols. Results show that 10-fold CV introduces substantial leakage due to temporal discontinuity, yielding an RMSE Gain of up to 20.5%; in contrast, two-way and three-way splits exhibit superior robustness, with RMSE Gain typically below 5%. Window size and lag step are identified as critical hyperparameters modulating leakage sensitivity. This work provides the first quantitative characterization of leakage magnitude in LSTM time-series evaluation and delivers empirically grounded, reproducible guidance for selecting leakage-resilient validation strategies.

Technology Category

Application Category

📝 Abstract
Deep learning models, particularly Long Short-Term Memory (LSTM) networks, are widely used in time series forecasting due to their ability to capture complex temporal dependencies. However, evaluation integrity is often compromised by data leakage, a methodological flaw in which input-output sequences are constructed before dataset partitioning, allowing future information to unintentionally influence training. This study investigates the impact of data leakage on performance, focusing on how validation design mediates leakage sensitivity. Three widely used validation techniques (2-way split, 3-way split, and 10-fold cross-validation) are evaluated under both leaky (pre-split sequence generation) and clean conditions, with the latter mitigating leakage risk by enforcing temporal separation during data splitting prior to sequence construction. The effect of leakage is assessed using RMSE Gain, which measures the relative increase in RMSE caused by leakage, computed as the percentage difference between leaky and clean setups. Empirical results show that 10-fold cross-validation exhibits RMSE Gain values of up to 20.5% at extended lag steps. In contrast, 2-way and 3-way splits demonstrate greater robustness, typically maintaining RMSE Gain below 5% across diverse configurations. Moreover, input window size and lag step significantly influence leakage sensitivity: smaller windows and longer lags increase the risk of leakage, whereas larger windows help reduce it. These findings underscore the need for configuration-aware, leakage-resistant evaluation pipelines to ensure reliable performance estimation.
Problem

Research questions and friction points this paper is trying to address.

Investigates data leakage impact on LSTM time series forecasting evaluation
Compares validation strategies' robustness to leakage across configurations
Analyzes how input window size and lag step affect leakage sensitivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using RMSE Gain to measure data leakage impact
Comparing validation strategies under leaky and clean conditions
Analyzing how window size and lag steps affect leakage
S
Salma Albelali
Imam Abdulrahman Bin Faisal University, Department of Computer Science, Dammam, Saudi Arabia
Moataz Ahmed
Moataz Ahmed
King Fahd University of Petroleum & Minerals
Artificial Intelligence