🤖 AI Summary
To address two key challenges in satellite-to-street-view image generation—imprecise pose alignment and uncontrollable environmental conditions (e.g., illumination, weather)—this paper proposes a geometry-semantic co-guided controllable diffusion framework. Methodologically: (1) an Iterative Homography Adjustment (IHA) module is introduced to explicitly model geometric constraints and enhance spatial consistency; (2) a novel text-guided zero-shot illumination/weather control mechanism is developed, leveraging a CLIP text encoder to drive a conditional sampling scheduler—requiring no paired training data. Experiments demonstrate a 12.6% improvement in pose alignment accuracy (mAP), alongside high-fidelity, diverse street-view synthesis across multiple lighting and weather conditions. The framework establishes new state-of-the-art performance in both visual realism and controllability.
📝 Abstract
Generating street-view images from satellite imagery is a challenging task, particularly in maintaining accurate pose alignment and incorporating diverse environmental conditions. While diffusion models have shown promise in generative tasks, their ability to maintain strict pose alignment throughout the diffusion process is limited. In this paper, we propose a novel Iterative Homography Adjustment (IHA) scheme applied during the denoising process, which effectively addresses pose misalignment and ensures spatial consistency in the generated street-view images. Additionally, currently, available datasets for satellite-to-street-view generation are limited in their diversity of illumination and weather conditions, thereby restricting the generalizability of the generated outputs. To mitigate this, we introduce a text-guided illumination and weather-controlled sampling strategy that enables fine-grained control over the environmental factors. Extensive quantitative and qualitative evaluations demonstrate that our approach significantly improves pose accuracy and enhances the diversity and realism of generated street-view images, setting a new benchmark for satellite-to-street-view generation tasks.