MMClima: A Framework for Multimodal Climate Science Data and Evaluation

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing climate question-answering benchmarks, which are typically small-scale and text-centric, thereby hindering multimodal AI’s capacity for integrative reasoning in climate science. To bridge this gap, the authors introduce the first multimodal question-answering framework spanning five major climate domains, synthesizing scientific articles, video transcripts, and charts. Through an automated pipeline involving claim extraction, question-answer generation, and human-AI collaborative verification, they produce over 104,000 expert-validated QA pairs. The study further contributes an open-source large-scale dataset, a standardized evaluation pipeline, and a training framework, along with the release of mmclima-70b-txt—a model that outperforms both leading open- and closed-source counterparts on textual climate QA tasks—significantly advancing multimodal research in climate science.

📝 Abstract

Climate change research increasingly requires AI systems that reason across text, dynamic visual content, and scientific figures, yet existing climate QA benchmarks are small, mostly textual, and cover a narrow range of models. We introduce MMClima, a large-scale multimodal climate question answering framework with 104k+ expert-validated question-answer pairs spanning articles, video transcriptions, and figures across five core climate science domains. MMClima is constructed via automated claim extraction and QA synthesis with human-in-the-loop validation to ensure both scale and reliability. Using MMClima, we benchmark state-of-the-art multimodal language models on tasks requiring factual recall, visual interpretation, and cross-modal synthesis. We additionally fine-tune on the textual split to produce mmclima-70b-txt, a domain-adapted baseline that outperforms strong open- and closed-source models on textual QA. We release the dataset, evaluation pipeline, fine-tuned model weights, and data creation framework to support standardized multimodal evaluation for climate science.

Problem

Research questions and friction points this paper is trying to address.

multimodal

climate science

question answering

benchmark

AI evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal

climate science

question answering

human-in-the-loop

domain adaptation

🔎 Similar Papers

The impact of internal variability on benchmarking deep learning climate emulators

2024-08-09arXiv.orgCitations: 1