DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing multimodal benchmarks for UAV-based disaster response are largely confined to perception tasks and cover a limited range of disaster types, making them inadequate for supporting the multi-stage causal reasoning and decision-making required in real-world operations. To address this gap, this work proposes DisasterBench—the first fine-grained, multi-stage reasoning benchmark encompassing 14 disaster categories and 9 critical tasks spanning pre-, during-, and post-disaster phases—alongside DisasterVL, a lightweight multimodal model tailored for edge deployment. DisasterVL employs a three-stage training paradigm: domain-specific instruction tuning, chain-of-thought-guided multimodal alignment, and reinforcement learning–based policy optimization, achieving high efficiency with only 2 billion parameters. Experiments demonstrate that DisasterVL outperforms 21 leading multimodal models, matching GPT-4o in accuracy and substantially narrowing the performance gap with state-of-the-art closed-source models.

📝 Abstract

When a disaster unfolds, responders must answer not only what is happening, but also why it is happening, what will happen next, and what to do now, often from noisy low-altitude UAV views and under tight on-site compute constraints. However, most existing multimodal benchmarks emphasize perception (e.g., recognition/description), cover limited disaster types, and provide insufficient support for the multi-stage reasoning required in practical emergency response. We introduce DisasterBench, a multi-stage multimodal reasoning benchmark for UAV-Based disaster response in complex environments. DisasterBench spans 14 disaster-related scene types and 9 response-critical tasks across pre-, during-, and post-disaster stages, with fine-grained disaster-task mappings that explicitly test causal attribution, propagation prediction, damage analysis, and decision-oriented reasoning. To enable reasoning on the edge, we further propose DisasterVL, a lightweight multimodal model optimized with a three-stage pipeline combining domain instruction tuning, chain-of-thought-guided multimodal alignment, and reinforcement learning-based policy optimization. Experiments across 21 popular MLLMs show that our 2B-parameter DisasterVL outperforms all evaluated open-source models and substantially narrows the gap to state-of-the-art closed-source models, achieving GPT-4o-comparable reasoning accuracy with superior efficiency. The project page is available at https://github.com/TanmouTT/DisasterBench.

Problem

Research questions and friction points this paper is trying to address.

multimodal benchmark

UAV-based disaster response

multi-stage reasoning

complex environments

emergency response

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal reasoning

UAV-based disaster response

lightweight multimodal model