🤖 AI Summary
Traditional post-wildfire property damage assessment suffers from time-consuming manual processes, reliance on labeled training data, and limited accuracy due to single-view imagery—leading to delayed emergency response. To address these limitations, this paper proposes a zero-shot, deployable visual-language model (VLM)-driven assessment framework. Methodologically, it integrates multi-view ground-level images with structured disaster prompts and leverages dual parallel pipelines—VLM for visual reasoning and large language model (LLM) for semantic interpretation—to enable fine-grained building damage classification without model fine-tuning. A key contribution is the empirical validation of VLMs’ capability to model subtle structural damage through multi-view reasoning. Evaluated on real-world Eaton and Palisades wildfires in California, the framework achieves F1-scores of 0.857–0.947, significantly outperforming single-view baselines (p < 0.01), while offering high accuracy, strong interpretability, and rapid deployment.
📝 Abstract
The escalating intensity and frequency of wildfires demand innovative computational methods for rapid and accurate property damage assessment. Traditional methods are often time consuming, while modern computer vision approaches typically require extensive labeled datasets, hindering immediate post-disaster deployment. This research introduces a novel, zero-shot framework leveraging pre-trained vision language models (VLMs) to classify damage from ground-level imagery. We propose and evaluate two pipelines applied to the 2025 Eaton and Palisades fires in California, a VLM (Pipeline A) and a VLM + large language model (LLM) approach (Pipeline B), that integrate structured prompts based on specific wildfire damage indicators. A primary scientific contribution of this study is demonstrating the VLMs efficacy in synthesizing information from multiple perspectives to identify nuanced damage, a critical limitation in existing literature. Our findings reveal that while single view assessments struggled to classify affected structures (F1 scores ranging from 0.225 to 0.511), the multi-view analysis yielded dramatic improvements (F1 scores ranging from 0.857 to 0.947). Moreover, the McNemar test confirmed that pipelines with a multi-view image assessment yields statistically significant classification improvements; however, the improvements this research observed between Pipeline A and B were not statistically significant. Thus, future research can explore the potential of LLM prompting in damage assessment. The practical contribution is an immediately deployable, flexible, and interpretable workflow that bypasses the need for supervised training, significantly accelerating triage and prioritization for disaster response practitioners.