🤖 AI Summary
This work addresses the limitations of existing climate question-answering benchmarks, which are typically small-scale and text-centric, thereby hindering multimodal AI’s capacity for integrative reasoning in climate science. To bridge this gap, the authors introduce the first multimodal question-answering framework spanning five major climate domains, synthesizing scientific articles, video transcripts, and charts. Through an automated pipeline involving claim extraction, question-answer generation, and human-AI collaborative verification, they produce over 104,000 expert-validated QA pairs. The study further contributes an open-source large-scale dataset, a standardized evaluation pipeline, and a training framework, along with the release of mmclima-70b-txt—a model that outperforms both leading open- and closed-source counterparts on textual climate QA tasks—significantly advancing multimodal research in climate science.
📝 Abstract
Climate change research increasingly requires AI systems that reason across text, dynamic visual content, and scientific figures, yet existing climate QA benchmarks are small, mostly textual, and cover a narrow range of models. We introduce MMClima, a large-scale multimodal climate question answering framework with 104k+ expert-validated question-answer pairs spanning articles, video transcriptions, and figures across five core climate science domains. MMClima is constructed via automated claim extraction and QA synthesis with human-in-the-loop validation to ensure both scale and reliability. Using MMClima, we benchmark state-of-the-art multimodal language models on tasks requiring factual recall, visual interpretation, and cross-modal synthesis. We additionally fine-tune on the textual split to produce mmclima-70b-txt, a domain-adapted baseline that outperforms strong open- and closed-source models on textual QA. We release the dataset, evaluation pipeline, fine-tuned model weights, and data creation framework to support standardized multimodal evaluation for climate science.