🤖 AI Summary
Adapting the GraphCast model efficiently for operational weather forecasting within Canada’s Global Deterministic Prediction System (GDPS) poses challenges in data and computational resource constraints.
Method: We propose a lightweight, task-specific adaptation framework requiring only two years of reanalysis data and 37 GPU-days to fine-tune the 37-layer GraphCast for regional meteorological forecasting. Our approach introduces a streamlined training curriculum centered on single-step prediction, extended into a four-stage multi-horizon autoregressive schedule—12-hour, 1-day, 2-day, and 3-day forecasts—with the 3-day stage decomposed into two memory-efficient sub-steps. We integrate gradient accumulation, memory optimization, and graph neural network–specific fine-tuning strategies.
Results: The adapted model outperforms both the original GraphCast and the operational GDPS across forecast lead times from 1 to 10 days, delivering substantial skill improvements—particularly for key tropospheric variables. This demonstrates the feasibility and practicality of large-model meteorological transfer learning under low-data and low-compute regimes.
📝 Abstract
This work describes a process for efficiently fine-tuning the GraphCast data-driven forecast model to simulate another analysis system, here the Global Deterministic Prediction System (GDPS) of Environment and Climate Change Canada (ECCC). Using two years of training data (July 2019 -- December 2021) and 37 GPU-days of computation to tune the 37-level, quarter-degree version of GraphCast, the resulting model significantly outperforms both the unmodified GraphCast and operational forecast, showing significant forecast skill in the troposphere over lead times from 1 to 10 days. This fine-tuning is accomplished through abbreviating DeepMind's original training curriculum for GraphCast, relying on a shorter single-step forecast stage to accomplish the bulk of the adaptation work and consolidating the autoregressive stages into separate 12hr, 1d, 2d, and 3d stages with larger learning rates. Additionally, training over 3d forecasts is split into two sub-steps to conserve host memory while maintaining a strong correlation with training over the full period.