DINO-GFSA: Geo-Localization via Semantic Gated Fusion and Mamba-based Sequential Aggregation

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the challenge of cross-view geo-localization for unmanned aerial vehicles in GNSS-denied environments, where achieving both semantic robustness and fine-grained spatial detail remains difficult. To this end, the authors propose an efficient and high-precision approach that leverages a LoRA-finetuned DINOv3 backbone to extract multi-scale features, introduces a novel semantic-gated residual fusion module to bridge the gap between semantic and spatial information, and incorporates Mamba-based sequence modeling—used here for the first time in this domain—to capture long-range dependencies with linear computational complexity. The method achieves state-of-the-art performance on both the University-1652 and DenseUAV benchmarks, notably improving Recall@1 by 3.48% on DenseUAV.

📝 Abstract

Cross-view geo-localization (CVGL) is critical for Unmanned Aerial Vehicle (UAV) self-positioning and target localization in GNSS-denied environments. However, acquiring robust semantics while preserving finegrained spatial details remains challenging. To address this, we propose DINO-GFSA, a framework leveraging a LoRA (Low-Rank Adaptation) adapted DINOv3 (ViTL) backbone for parameter-efficient, high-capacity representation. Crucially, we introduce a Semantic Gated Residual Fusion module, which utilizes high-level semantics to selectively calibrate and integrate low-level spatial cues, effectively bridging the semantic gap. Furthermore, a Mamba-based Sequential Aggregation Head is designed to capture long-range spatial dependencies with linear complexity. Experiments demonstrate state-of-the-art performance on University-1652 and DenseUAV benchmarks, notably surpassing the previous best on DenseUAV by 3.48% on Recall@1. These results validate DINO-GFSA as a generalized, robust solution for UAV CVGL.

Problem

Research questions and friction points this paper is trying to address.

Cross-view geo-localization

UAV

Semantic representation

Spatial details

GNSS-denied environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Gated Fusion

Mamba-based Aggregation

LoRA-adapted DINOv3