CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of evaluation benchmarks for large language models’ (LLMs) comprehension and generation capabilities in social media climate discourse—particularly multimodal content such as ironic memes and skeptical posts. We introduce CliME, the first multimodal climate evaluation benchmark tailored to Twitter/Reddit image-text posts (2,579 samples). To assess model performance, we propose the Climate Alignment Quotient (CAQ), a five-dimensional quantitative metric, and establish a three-dimensional analytical framework grounded in actionability, criticality, and justice. Leveraging multimodal data curation, social-science-informed prompt engineering, and human-in-the-loop annotation, our systematic evaluation reveals a critical gap: while mainstream LLMs demonstrate moderate critical and justice-aware reasoning, they consistently underperform in generating concrete, actionable climate recommendations. Claude 3.7 Sonnet achieves the highest overall CAQ score. The dataset, code, and evaluation framework are publicly released.

Technology Category

Application Category

📝 Abstract
The rise of Large Language Models (LLMs) has raised questions about their ability to understand climate-related contexts. Though climate change dominates social media, analyzing its multimodal expressions is understudied, and current tools have failed to determine whether LLMs amplify credible solutions or spread unsubstantiated claims. To address this, we introduce CliME (Climate Change Multimodal Evaluation), a first-of-its-kind multimodal dataset, comprising 2579 Twitter and Reddit posts. The benchmark features a diverse collection of humorous memes and skeptical posts, capturing how these formats distill complex issues into viral narratives that shape public opinion and policy discussions. To systematically evaluate LLM performance, we present the Climate Alignment Quotient (CAQ), a novel metric comprising five distinct dimensions: Articulation, Evidence, Resonance, Transition, and Specificity. Additionally, we propose three analytical lenses: Actionability, Criticality, and Justice, to guide the assessment of LLM-generated climate discourse using CAQ. Our findings, based on the CAQ metric, indicate that while most evaluated LLMs perform relatively well in Criticality and Justice, they consistently underperform on the Actionability axis. Among the models evaluated, Claude 3.7 Sonnet achieves the highest overall performance. We publicly release our CliME dataset and code to foster further research in this domain.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' understanding of climate-related social media content
Assessing multimodal climate discourse credibility and alignment
Developing metrics for LLM performance in climate communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces CliME dataset for multimodal climate discourse
Develops Climate Alignment Quotient (CAQ) metric
Proposes Actionability, Criticality, Justice analytical lenses
🔎 Similar Papers
No similar papers found.