Cross-view Localization and Synthesis - Datasets, Challenges and Opportunities

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-view localization and synthesis aim to establish geometric and semantic correspondences between satellite/aerial and ground-level images, yet face challenges including large viewpoint discrepancies, resolution mismatches, and occlusions. This paper presents a systematic survey of prevailing datasets, methodological advancements, and fundamental bottlenecks in the field. It unifies, for the first time, the technical trajectories of localization—primarily based on CNN- or ViT-driven feature matching—and synthesis—leveraging hybrid generative paradigms integrating GANs and diffusion models. We introduce an open-source benchmark platform incorporating state-of-the-art models and standardized evaluation protocols. Cross-method analysis reveals current performance ceilings and generalization limitations, identifying joint “viewpoint-scale-semantic” modeling as the critical breakthrough direction. Finally, we outline promising future research avenues, including multimodal alignment, incorporation of 3D priors, and lightweight deployment strategies.

Technology Category

Application Category

📝 Abstract
Cross-view localization and synthesis are two fundamental tasks in cross-view visual understanding, which deals with cross-view datasets: overhead (satellite or aerial) and ground-level imagery. These tasks have gained increasing attention due to their broad applications in autonomous navigation, urban planning, and augmented reality. Cross-view localization aims to estimate the geographic position of ground-level images based on information provided by overhead imagery while cross-view synthesis seeks to generate ground-level images based on information from the overhead imagery. Both tasks remain challenging due to significant differences in viewing perspective, resolution, and occlusion, which are widely embedded in cross-view datasets. Recent years have witnessed rapid progress driven by the availability of large-scale datasets and novel approaches. Typically, cross-view localization is formulated as an image retrieval problem where ground-level features are matched with tiled overhead images feature, extracted by convolutional neural networks (CNNs) or vision transformers (ViTs) for cross-view feature embedding. Cross-view synthesis, on the other hand, seeks to generate ground-level views based on information from overhead imagery, generally using generative adversarial networks (GANs) or diffusion models. This paper presents a comprehensive survey of advances in cross-view localization and synthesis, reviewing widely used datasets, highlighting key challenges, and providing an organized overview of state-of-the-art techniques. Furthermore, it discusses current limitations, offers comparative analyses, and outlines promising directions for future research. We also include the project page via https://github.com/GDAOSU/Awesome-Cross-View-Methods.
Problem

Research questions and friction points this paper is trying to address.

Estimating geographic positions using overhead and ground imagery
Generating ground-level views from satellite or aerial images
Addressing perspective and resolution differences in cross-view datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using CNNs and ViTs for cross-view feature embedding
Applying GANs and diffusion models for view synthesis
Formulating localization as image retrieval with tiled matching
🔎 Similar Papers
No similar papers found.