🤖 AI Summary
Accelerating climate change is increasing the frequency and severity of extreme events, causing them to deviate significantly from historical distributions (out-of-distribution, OOD). While modern machine learning (ML) climate models achieve high accuracy and efficiency, their generalization performance under distributional shifts remains poorly understood.
Method: This work introduces, for the first time, a systematic OOD evaluation paradigm to climate forecasting. We propose a robustness assessment framework tailored to climate models, integrating state-of-the-art ML architectures, multi-source observational and reanalysis climate data, and advanced OOD detection techniques to quantitatively evaluate predictive performance across diverse climate shift scenarios.
Contribution/Results: Experiments reveal severe limitations in current models’ OOD generalization—marked by substantial performance degradation and high variance across extreme-event regimes. Model architecture critically influences robustness, with notable inter-architectural disparities. We identify key failure modes under distributional shift, establishing foundational theoretical insights and methodological tools for trustworthy climate risk prediction.
📝 Abstract
Climate change is accelerating the frequency and severity of unprecedented events, deviating from established patterns. Predicting these out-of-distribution (OOD) events is critical for assessing risks and guiding climate adaptation. While machine learning (ML) models have shown promise in providing precise, high-speed climate predictions, their ability to generalize under distribution shifts remains a significant limitation that has been underexplored in climate contexts. This research systematically evaluates state-of-the-art ML-based climate models in diverse OOD scenarios by adapting established OOD evaluation methodologies to climate data. Experiments on large-scale datasets reveal notable performance variability across scenarios, shedding light on the strengths and limitations of current models. These findings underscore the importance of robust evaluation frameworks and provide actionable insights to guide the reliable application of ML for climate risk forecasting.