ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method

📅 2025-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing climate benchmarks lack unified spatiotemporal granularity and fail to integrate heterogeneous modalities—such as reanalysis time series, extreme-event labels, and satellite imagery—hindering multimodal climate modeling. Method: We propose ClimateBench-M, the first unified multimodal climate benchmark aligning ERA5 reanalysis time series, NOAA extreme-event annotations, and NASA HLS satellite imagery at consistent spatiotemporal resolution. To address cross-modal modeling challenges, we design a lightweight conditional generative framework incorporating spatiotemporal registration, cross-modal alignment via representation learning, and a Transformer-based joint time-series–image generation architecture—moving beyond unimodal paradigms. Contribution/Results: ClimateBench-M achieves state-of-the-art or near-state-of-the-art performance on weather forecasting, thunderstorm prediction, and crop segmentation. All data, code, and pretrained models are publicly released, establishing a standardized infrastructure and methodology for foundation-model research in climate science.

Technology Category

Application Category

📝 Abstract
Climate science studies the structure and dynamics of Earth's climate system and seeks to understand how climate changes over time, where the data is usually stored in the format of time series, recording the climate features, geolocation, time attributes, etc. Recently, much research attention has been paid to the climate benchmarks. In addition to the most common task of weather forecasting, several pioneering benchmark works are proposed for extending the modality, such as domain-specific applications like tropical cyclone intensity prediction and flash flood damage estimation, or climate statement and confidence level in the format of natural language. To further motivate the artificial general intelligence development for climate science, in this paper, we first contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns (1) the time series climate data from ERA5, (2) extreme weather events data from NOAA, and (3) satellite image data from NASA HLS based on a unified spatial-temporal granularity. Second, under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks in the proposed ClimateBench-M. The data and code of ClimateBench-M are publicly available at https://github.com/iDEA-iSAIL-Lab-UIUC/ClimateBench-M.
Problem

Research questions and friction points this paper is trying to address.

Develops a multi-modal climate benchmark for diverse tasks
Aligns time series, weather events, and satellite image data
Proposes generative methods for weather and environmental predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal climate benchmark with unified alignment
Simple generative method for diverse tasks
Publicly available data and code repository
🔎 Similar Papers
No similar papers found.
Dongqi Fu
Dongqi Fu
Research Scientist, Meta AI
Geometric Deep LearningSequence ModelingProbabilistic Graphical Models
Yada Zhu
Yada Zhu
IBM
graphlarge language modeltime seriesfinance
Z
Zhining Liu
University of Illinois Urbana-Champaign, IBM Research
Lecheng Zheng
Lecheng Zheng
University of Illinois at Urbana-Champaign
Heterogeneous LearningGraph MiningMulti-modal LearningAnomaly DetectionMulti-label Learning
X
Xiao Lin
University of Illinois Urbana-Champaign, IBM Research
Z
Zihao Li
University of Illinois Urbana-Champaign, IBM Research
L
Liri Fang
University of Illinois Urbana-Champaign, IBM Research
K
Katherine Tieu
University of Illinois Urbana-Champaign, IBM Research
O
Onkar Bhardwaj
University of Illinois Urbana-Champaign, IBM Research
K
Kommy Weldemariam
University of Illinois Urbana-Champaign, IBM Research
Hanghang Tong
Hanghang Tong
University of Illinois at Urbana-Champaign
Large Scale Data MiningGraph MiningSocial NetworksHealthcareMultimedia
Hendrik Hamann
Hendrik Hamann
Professor at Stony Brook University
PhysicsIoTMLAIGeospatial
Jingrui He
Jingrui He
University of Illinois at Urbana-Champaign
Machine LearningData MiningSocial NetworksMedical InformaticsSemiconductor Manufacturing