🤖 AI Summary
Existing power forecasting benchmarks suffer from significant limitations in spatiotemporal coverage and multi-energy coupling modeling, undermining their robustness for real-world deployment. To address this, we introduce EuroGrid—the first high-resolution electricity load benchmark covering over 30 European countries, 74 power plants, and a decade of data—integrating heterogeneous spatiotemporal observations and fine-grained metadata. We propose Dynamic Correlation Entropy (DCE), a novel metric that quantifies, for the first time, how nonstationary correlation structure evolution degrades forecasting performance. Through systematic evaluation across 20+ state-of-the-art models, we find severe generalization deficiencies: average errors exceed those of the optimal baseline by 37%, and performance collapses under cross-regional transfer. This work establishes a reproducible benchmark, introduces a new evaluation paradigm grounded in dynamic dependency analysis, and identifies concrete pathways toward improving robustness in energy time-series modeling.
📝 Abstract
Energy forecasting is vital for grid reliability and operational efficiency. Although recent advances in time series forecasting have led to progress, existing benchmarks remain limited in spatial and temporal scope and lack multi-energy features. This raises concerns about their reliability and applicability in real-world deployment. To address this, we present the Real-E dataset, covering over 74 power stations across 30+ European countries over a 10-year span with rich metadata. Using Real- E, we conduct an extensive data analysis and benchmark over 20 baselines across various model types. We introduce a new metric to quantify shifts in correlation structures and show that existing methods struggle on our dataset, which exhibits more complex and non-stationary correlation dynamics. Our findings highlight key limitations of current methods and offer a strong empirical basis for building more robust forecasting models