Validating Deep-Learning Weather Forecast Models on Recent High-Impact Extreme Events

📅 2024-04-26
🏛️ Artificial Intelligence for the Earth Systems
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning (DL) weather forecasting models exhibit insufficient reliability for rare, high-impact extreme events—such as heatwaves and winter storms—limiting their operational utility for impact-based decision-making. Method: We propose an “impact-centered” verification paradigm grounded in real-world disaster case studies, integrating multi-source reanalysis data with health-relevant risk indicators, and employing spatiotemporal aggregation and variable substitution for holistic evaluation. Contribution/Results: Benchmarking GraphCast, PanguWeather, FourCastNet, and ECMWF HRES reveals that DL models achieve near-HRES local accuracy during the Pacific Northwest heatwave but suffer degraded aggregation performance; they substantially outperform HRES during the North American winter storm; yet they omit critical health-related variables during the South Asian humid heatwave, leading to systematic underestimation of risk in high-vulnerability regions like Bangladesh. This work uncovers fundamental structural disparities between DL models and numerical weather prediction systems in error accumulation, variable dependency modeling, and compound-risk representation—establishing a novel, impact-oriented framework for evaluating meteorological AI.

Technology Category

Application Category

📝 Abstract
The forecast accuracy of machine learning (ML) weather prediction models is improving rapidly, leading many to speak of a “second revolution in weather forecasting”. With numerous methods being developed, and limited physical guarantees offered by ML models, there is a critical need for comprehensive evaluation of these emerging techniques. While this need has been partly fulfilled by benchmark datasets, they provide little information on rare and impactful extreme events, or on compound impact metrics, for which model accuracy might degrade due to misrepresented dependencies between variables. To address these issues, we compare ML weather prediction models (GraphCast, PanguWeather, FourCastNet) and ECMWF’s high-resolution forecast (HRES) system in three case studies: the 2021 Pacific Northwest heatwave, the 2023 South Asian humid heatwave, and the North American winter storm in 2021. We find that ML weather prediction models locally achieve similar accuracy to HRES on the record-shattering Pacific Northwest heatwave, but under-perform when aggregated over space and time. However, they forecast the compound winter storm substantially better. We also highlight structural differences in how the errors of HRES and the ML models build up to that event. The ML forecasts lack important variables for a detailed assessment of the health risks of the 2023 humid heatwave. Using a possible substitute variable, prediction errors show spatial patterns with the highest danger levels over Bangladesh being underestimated by the ML models. Generally, case-study-driven, impact-centric evaluation can complement existing research, increase public trust, and aid in developing reliable ML weather prediction models.
Problem

Research questions and friction points this paper is trying to address.

Deep Learning
Extreme Weather Prediction
Accuracy Comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extreme Weather Prediction
Machine Learning Models
Accuracy Improvement
🔎 Similar Papers
No similar papers found.
Olivier C. Pasche
Olivier C. Pasche
Research Institute for Statistics and Information Science, University of Geneva
StatisticsMachine LearningExtremesCausality
J
Jonathan Wider
Department of Compound Environmental Risks, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany; Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany
Z
Zhongwei Zhang
Research Institute for Statistics and Information Science, University of Geneva, Switzerland
Jakob Zscheischler
Jakob Zscheischler
Helmholtz Centre for Environmental Research - UFZ
Extreme EventsCompound EventsInterannual VariabilityCarbon Cycle
Sebastian Engelke
Sebastian Engelke
Associate Professor, University of Geneva
statisticsextreme value theorymachine learningapplied probability