🤖 AI Summary
This study addresses the poor physical consistency and limited interpretability of data-driven models in meteorological and climate forecasting. We propose a physics-guided multimodal Transformer framework that treats distinct geophysical variables—such as temperature, pressure, and wind speed—as heterogeneous modalities, establishing the first meteorological multimodal representation paradigm. A differentiable physics-informed regularization term is introduced to explicitly incorporate physical priors, including conservation laws. By integrating cross-modal attention with meteorological spatiotemporal encoding, the framework unifies short-term weather forecasting and long-term climate simulation. Experiments demonstrate that our method significantly outperforms both purely data-driven baselines and conventional numerical weather/climate models across multiple benchmarks. It achieves superior predictive accuracy while maintaining strong physical consistency, enhanced model interpretability, and robust generalization capability.
📝 Abstract
With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporate physical signals, such as temperature, pressure, and wind speed, to further enhance accuracy and interpretability. In this paper, we review several representative AI + Weather/Climate algorithms and propose a new paradigm where observational data from different perspectives, each with distinct physical meanings, are treated as multimodal data and integrated via transformers. Furthermore, key weather and climate knowledge can be incorporated through regularization techniques to further strengthen the model's capabilities. This new paradigm is versatile and can address a variety of tasks, offering strong generalizability. We also discuss future directions for improving model accuracy and interpretability.