🤖 AI Summary
Existing environmental forecasting research focuses on numerical meteorological variables (e.g., temperature) and lacks translation into actionable, natural-language event descriptions. Method: We introduce Weather and Climate Event Forecasting (WCEF)—a novel cross-modal task bridging meteorological gridded data (ERA5) and textual event narratives. We construct CLLMate, the first multimodal benchmark aligning news-reported weather/climate events with ERA5 reanalysis data (26,156 samples), and systematically evaluate 23 multimodal large language models (MLLMs) under zero-shot and few-shot settings. Contribution/Results: Our evaluation reveals critical deficiencies in current MLLMs regarding event-level semantic understanding and spatiotemporal reasoning. CLLMate is empirically validated as an effective resource for both WCEF model training and rigorous evaluation, establishing the first domain-specific, cross-modal benchmark for environmental intelligence.
📝 Abstract
Forecasting weather and climate events is crucial for making appropriate measures to mitigate environmental hazards and minimize losses. However, existing environmental forecasting research focuses narrowly on predicting numerical meteorological variables (e.g., temperature), neglecting the translation of these variables into actionable textual narratives of events and their consequences. To bridge this gap, we proposed Weather and Climate Event Forecasting (WCEF), a new task that leverages numerical meteorological raster data and textual event data to predict weather and climate events. This task is challenging to accomplish due to difficulties in aligning multimodal data and the lack of supervised datasets. To address these challenges, we present CLLMate, the first multimodal dataset for WCEF, using 26,156 environmental news articles aligned with ERA5 reanalysis data. We systematically benchmark 23 existing MLLMs on CLLMate, including closed-source, open-source, and our fine-tuned models. Our experiments reveal the advantages and limitations of existing MLLMs and the value of CLLMate for the training and benchmarking of the WCEF task.