SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the degraded matching robustness in Visual Place Recognition (VPR) under severe cross-scene variations in appearance, viewpoint, and environment, this paper proposes a spatial-visual co-adaptive graph exploration training framework. Methodologically, it (1) constructs an online geo-visual graph and introduces a soft-detection module to optimize local feature aggregation; (2) designs a weighted-clique expansion strategy for dynamic hard-negative mining; and (3) integrates residual weight learning, bilinear feature aggregation, parameter-efficient fine-tuning, and dynamic graph clustering. Leveraging a frozen DINOv2 backbone, the method achieves state-of-the-art performance across eight mainstream benchmarks: Recall@1 reaches 98.9% on SPED, 95.8% on Pitts30k-test, 94.5% on MSLS-val, and 96.0% on Nordland; notably, it attains 100% Recall@10 on SPED using 4096-dimensional descriptors.

Technology Category

Application Category

📝 Abstract

Visual Place Recognition (VPR) requires robust retrieval of geotagged images despite large appearance, viewpoint, and environmental variation. Prior methods focus on descriptor fine-tuning or fixed sampling strategies yet neglect the dynamic interplay between spatial context and visual similarity during training. We present SAGE (Spatial-visual Adaptive Graph Exploration), a unified training pipeline that enhances granular spatial-visual discrimination by jointly improving local feature aggregation, organize samples during training, and hard sample mining. We introduce a lightweight Soft Probing module that learns residual weights from training data for patch descriptors before bilinear aggregation, boosting distinctive local cues. During training we reconstruct an online geo-visual graph that fuses geographic proximity and current visual similarity so that candidate neighborhoods reflect the evolving embedding landscape. To concentrate learning on the most informative place neighborhoods, we seed clusters from high-affinity anchors and iteratively expand them with a greedy weighted clique expansion sampler. Implemented with a frozen DINOv2 backbone and parameter-efficient fine-tuning, SAGE achieves SOTA across eight benchmarks. It attains 98.9%, 95.8%, 94.5%, and 96.0% Recall@1 on SPED, Pitts30k-test, MSLS-val, and Nordland, respectively. Notably, our method obtains 100% Recall@10 on SPED only using 4096D global descriptors. Code and model will be available at: https://github.com/chenshunpeng/SAGE.

Problem

Research questions and friction points this paper is trying to address.

Enhancing visual place recognition under appearance and viewpoint variations

Addressing dynamic spatial-visual interplay neglected by prior methods

Improving local feature aggregation and hard sample mining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns residual weights for patch descriptors before aggregation

Reconstructs online geo-visual graph fusing geographic and visual data

Uses greedy weighted clique expansion for hard sample mining

🔎 Similar Papers

No similar papers found.