DINO-GFSA: Geo-Localization via Semantic Gated Fusion and Mamba-based Sequential Aggregation

📅 2026-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the challenge of cross-view geo-localization for unmanned aerial vehicles in GNSS-denied environments, where achieving both semantic robustness and fine-grained spatial detail remains difficult. To this end, the authors propose an efficient and high-precision approach that leverages a LoRA-finetuned DINOv3 backbone to extract multi-scale features, introduces a novel semantic-gated residual fusion module to bridge the gap between semantic and spatial information, and incorporates Mamba-based sequence modeling—used here for the first time in this domain—to capture long-range dependencies with linear computational complexity. The method achieves state-of-the-art performance on both the University-1652 and DenseUAV benchmarks, notably improving Recall@1 by 3.48% on DenseUAV.
📝 Abstract
Cross-view geo-localization (CVGL) is critical for Unmanned Aerial Vehicle (UAV) self-positioning and target localization in GNSS-denied environments. However, acquiring robust semantics while preserving finegrained spatial details remains challenging. To address this, we propose DINO-GFSA, a framework leveraging a LoRA (Low-Rank Adaptation) adapted DINOv3 (ViTL) backbone for parameter-efficient, high-capacity representation. Crucially, we introduce a Semantic Gated Residual Fusion module, which utilizes high-level semantics to selectively calibrate and integrate low-level spatial cues, effectively bridging the semantic gap. Furthermore, a Mamba-based Sequential Aggregation Head is designed to capture long-range spatial dependencies with linear complexity. Experiments demonstrate state-of-the-art performance on University-1652 and DenseUAV benchmarks, notably surpassing the previous best on DenseUAV by 3.48% on Recall@1. These results validate DINO-GFSA as a generalized, robust solution for UAV CVGL.
Problem

Research questions and friction points this paper is trying to address.

Cross-view geo-localization
UAV
Semantic representation
Spatial details
GNSS-denied environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Gated Fusion
Mamba-based Aggregation
LoRA-adapted DINOv3
Cross-view Geo-localization
Parameter-efficient Learning
🔎 Similar Papers
No similar papers found.
B
Beier Hu
School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China
Y
Yuanshen Guo
School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China
J
Jialu Cai
School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China
C
Chengwei Li
School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China
Yong Wang
Yong Wang
Professor of Computer Science, Ocean University of China
Software EngineeringOperational ResearchMachine Learning
S
Shunan Wu
School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China
Z
Zhigang Wu
School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China