🤖 AI Summary
Existing research lacks high-quality corpora and benchmarks to rigorously evaluate large language models’ (LLMs) understanding of societal impacts of disruptive weather, hindering their deployment in climate adaptation. To address this, we introduce the first LLM evaluation benchmark specifically designed for understanding extreme-weather-induced societal impacts, constructed from regional newspaper coverage of disaster responses. Our framework comprises two complementary tasks: multi-label classification and relevance-ranked question answering. Key contributions include: (1) establishing the first evaluation paradigm for weather impact understanding; (2) proposing a four-stage high-fidelity data curation pipeline—news cleaning, structured information extraction, domain-adaptive annotation, and task-specific adaptation; and (3) open-sourcing the first Disruptive Weather Impact (DWI) dataset and evaluation code. Empirical results reveal critical limitations of state-of-the-art LLMs in causal reasoning and cross-event generalization.
📝 Abstract
Climate change adaptation requires the understanding of disruptive weather impacts on society, where large language models (LLMs) might be applicable. However, their effectiveness is under-explored due to the difficulty of high-quality corpus collection and the lack of available benchmarks. The climate-related events stored in regional newspapers record how communities adapted and recovered from disasters. However, the processing of the original corpus is non-trivial. In this study, we first develop a disruptive weather impact dataset with a four-stage well-crafted construction pipeline. Then, we propose WXImpactBench, the first benchmark for evaluating the capacity of LLMs on disruptive weather impacts. The benchmark involves two evaluation tasks, multi-label classification and ranking-based question answering. Extensive experiments on evaluating a set of LLMs provide first-hand analysis of the challenges in developing disruptive weather impact understanding and climate change adaptation systems. The constructed dataset and the code for the evaluation framework are available to help society protect against vulnerabilities from disasters.