Satellite-to-Street: Synthesizing Post-Disaster Views from Satellite Imagery via Generative Vision Models

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the scarcity of post-disaster street-level imagery, which hinders accurate ground-level structural damage assessment, noting that while satellite imagery is readily available, it lacks a ground-level perspective. The work presents the first systematic exploration of generating post-disaster street views from satellite images, introducing two novel strategies: a vision-language model (VLM)-guided generation approach and a damage-sensitive Mixture-of-Experts (MoE) architecture, alongside a structure-aware evaluation framework. Experiments across 300 disaster scenarios show that ControlNet achieves the highest semantic accuracy (0.71), whereas the VLM and MoE methods produce more photorealistic textures but suffer from reduced semantic clarity, revealing a critical trade-off between visual realism and structural fidelity in synthetic street-view generation for disaster assessment.

Technology Category

Application Category

📝 Abstract

In the immediate aftermath of natural disasters, rapid situational awareness is critical. Traditionally, satellite observations are widely used to estimate damage extent. However, they lack the ground-level perspective essential for characterizing specific structural failures and impacts. Meanwhile, ground-level data (e.g., street-view imagery) remains largely inaccessible during time-sensitive events. This study investigates Satellite-to-Street View Synthesis to bridge this data gap. We introduce two generative strategies to synthesize post-disaster street views from satellite imagery: a Vision-Language Model (VLM)-guided approach and a damage-sensitive Mixture-of-Experts (MoE) method. We benchmark these against general-purpose baselines (Pix2Pix, ControlNet) using a proposed Structure-Aware Evaluation Framework. This multi-tier protocol integrates (1) pixel-level quality assessment, (2) ResNet-based semantic consistency verification, and (3) a novel VLM-as-a-Judge for perceptual alignment. Experiments on 300 disaster scenarios reveal a critical realism--fidelity trade-off: while diffusion-based approaches (e.g., ControlNet) achieve high perceptual realism, they often hallucinate structural details. Quantitative results show that standard ControlNet achieves the highest semantic accuracy, 0.71, whereas VLM-enhanced and MoE models excel in textural plausibility but struggle with semantic clarity. This work establishes a baseline for trustworthy cross-view synthesis, emphasizing that visually realistic generations may still fail to preserve critical structural information required for reliable disaster assessment.

Problem

Research questions and friction points this paper is trying to address.

Satellite-to-Street Synthesis

Post-Disaster Assessment

Ground-Level View Generation

Cross-View Synthesis

Situational Awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Satellite-to-Street Synthesis

Vision-Language Model (VLM)

Mixture-of-Experts (MoE)

Structure-Aware Evaluation

Cross-View Generation

🔎 Similar Papers

No similar papers found.

Authors to Follow