VISTA: Monocular Segmentation-Based Mapping for Appearance and View-Invariant Global Localization

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing global localization methods exhibit insufficient robustness for cross-session or cross-map localization in unstructured environments, primarily due to appearance variations induced by viewpoint shifts, seasonal changes, and other domain-specific factors. Method: This paper proposes a lightweight monocular segmentation–based global localization framework that, for the first time, integrates open-set semantic segmentation with subgraph geometric consistency matching—enabling appearance- and viewpoint-invariance without domain-specific training. The approach jointly leverages monocular semantic segmentation, object-level tracking, and subgraph correspondence search to align reference frames via geometric consistency across environmental maps. Contribution/Results: Evaluated on seasonal and oblique-view aerial datasets, the method achieves up to a 69% improvement in recall over baselines. Its map representation occupies only 0.6% of the volume of the most compact baseline, enabling real-time deployment on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract
Global localization is critical for autonomous navigation, particularly in scenarios where an agent must localize within a map generated in a different session or by another agent, as agents often have no prior knowledge about the correlation between reference frames. However, this task remains challenging in unstructured environments due to appearance changes induced by viewpoint variation, seasonal changes, spatial aliasing, and occlusions -- known failure modes for traditional place recognition methods. To address these challenges, we propose VISTA (View-Invariant Segmentation-Based Tracking for Frame Alignment), a novel open-set, monocular global localization framework that combines: 1) a front-end, object-based, segmentation and tracking pipeline, followed by 2) a submap correspondence search, which exploits geometric consistencies between environment maps to align vehicle reference frames. VISTA enables consistent localization across diverse camera viewpoints and seasonal changes, without requiring any domain-specific training or finetuning. We evaluate VISTA on seasonal and oblique-angle aerial datasets, achieving up to a 69% improvement in recall over baseline methods. Furthermore, we maintain a compact object-based map that is only 0.6% the size of the most memory-conservative baseline, making our approach capable of real-time implementation on resource-constrained platforms.
Problem

Research questions and friction points this paper is trying to address.

Monocular global localization in unstructured environments
Handling appearance changes from viewpoint and seasons
Achieving real-time performance on resource-constrained platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular segmentation-based mapping for view-invariance
Object-based segmentation and tracking pipeline
Submap correspondence search for geometric consistency
🔎 Similar Papers
No similar papers found.
H
Hannah Shafferman
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Annika Thomas
Annika Thomas
Massachusetts Institute of Technology, Columbia University, Stanford University
EstimationAutonomyControl SystemsSmall Satellites
Jouko Kinnari
Jouko Kinnari
Member of technical staff, Nest AI Oy
Robotic perceptionlocalizationUAVsIoTR&D
M
Michael Ricard
Charles Stark Draper Laboratory, Inc., Cambridge, MA, USA
J
Jose Nino
Charles Stark Draper Laboratory, Inc., Cambridge, MA, USA
J
Jonathan How
Massachusetts Institute of Technology, Cambridge, MA 02139, USA