Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation

๐Ÿ“… 2026-03-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes Bearing-UAV, a purely vision-based cross-view navigation method that jointly predicts the absolute position and heading of a drone, explicitly modeling spatial relationships to overcome the limitations of conventional map-tile matching approaches. Existing methods face a trade-off between localization accuracy and storage overhead while neglecting heading information, rendering them ineffective under large viewpoint discrepancies and misalignment common in real-world scenarios. In contrast, Bearing-UAV integrates global and local structural features and leverages relative spatial encoding for end-to-end localization without requiring pre-stored map tiles. Evaluated on Bearing-UAV-90kโ€”a newly curated multi-city benchmark with diverse terrainsโ€”the method significantly outperforms current matching- or retrieval-based approaches, enabling lightweight and robust navigation in GNSS-denied environments.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in cross-view geo-localization (CVGL) methods have shown strong potential for supporting unmanned aerial vehicle (UAV) navigation in GNSS-denied environments. However, existing work predominantly focuses on matching UAV views to onboard map tiles, which introduces an inherent trade-off between accuracy and storage overhead, and overlooks the importance of the UAV's heading during navigation. Moreover, the substantial discrepancies and varying overlaps in cross-view scenarios have been insufficiently considered, limiting their generalization to real-world scenarios. In this paper, we present Bearing-UAV, a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, enabling accurate, lightweight, and robust navigation in the wild. Our method leverages global and local structural features and explicitly encodes relative spatial relationships, making it robust to cross-view variations, misalignment, and feature-sparse conditions. We also present Bearing-UAV-90k, a multi-city benchmark for evaluating cross-view localization and navigation. Extensive experiments show encouraging results that Bearing-UAV yields lower localization error than previous matching/retrieval paradigm across diverse terrains. Our code and dataset will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

cross-view geo-localization
UAV navigation
GNSS-denied environments
view misalignment
heading estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-view geo-localization
vision-only navigation
UAV heading estimation
spatial relationship encoding
GNSS-denied navigation
๐Ÿ”Ž Similar Papers
No similar papers found.
K
Kejia Liu
College of Computer Science and Technology, Zhejiang University
H
Haoyang Zhou
College of Computer Science and Technology, Zhejiang University
Ruoyu Xu
Ruoyu Xu
Zhejiang University, ByteDance
P
Peicheng Wang
College of Computer Science and Technology, Zhejiang University
M
Mingli Song
College of Computer Science and Technology, Zhejiang University; State Key Laboratory of Blockchain and Data Security, Zhejiang University; Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
H
Haofei Zhang
State Key Laboratory of Blockchain and Data Security, Zhejiang University; Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security