SR-Nav: Spatial Relationships Matter for Zero-shot Object Goal Navigation

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of zero-shot object-goal navigation, where existing methods struggle with reliable perception and planning under limited viewpoints or weak semantic cues. To overcome this, the authors propose SR-Nav, a novel framework that introduces a goal-centric Dynamic Spatial Relationship Graph (DSRG)β€”the first of its kindβ€”to encode observations and empirical spatial relationships using foundation models. SR-Nav incorporates a relation-matching module to enhance perceptual robustness and a dynamic relation-planning module to optimize path search. By effectively integrating visual perception with structured scene priors, the method significantly improves reasoning capability, success rate, and navigation efficiency under partial observability on the HM3D benchmark, demonstrating the critical role of explicit spatial relationship modeling in zero-shot navigation.

Technology Category

Application Category

πŸ“ Abstract
Zero-shot object-goal navigation aims to find target objects in unseen environments using only egocentric observation. Recent methods leverage foundation models' comprehension and reasoning capabilities to enhance navigation performance. However, when faced with poor viewpoints or weak semantic cues, foundation models often fail to support reliable reasoning in both perception and planning, resulting in inefficient or failed navigation. We observe that inherent relationships among objects and regions encode structured scene priors, which help agents infer plausible target locations even under partial observations. Motivated by this insight, we propose Spatial Relation-aware Navigation (SR-Nav), a framework that models both observed and experience-based spatial relationships to enhance both perception and planning. Specifically, SR-Nav first constructs a Dynamic Spatial Relationship Graph (DSRG) that encodes the target-centered spatial relationships through the foundation models and updates dynamically with real-time observations. We then introduce a Relation-aware Matching Module. It utilizes relationship matching instead of naive detection, leveraging diverse relationships in the DSRG to verify and correct errors, enhancing visual perception robustness. Finally, we design a Dynamic Relationship Planning Module to reduce the planning search space by dynamically computing the optimal paths based on the DSRG from the current position, thereby guiding planning and reducing exploration redundancy. Experiments on HM3D show that our method achieves state-of-the-art performance in both success rate and navigation efficiency. The code will be publicly available at https://github.com/Mzyw-1314/SR-Nav
Problem

Research questions and friction points this paper is trying to address.

zero-shot object-goal navigation
spatial relationships
foundation models
visual perception
navigation planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial relationships
zero-shot navigation
foundation models
dynamic graph
relation-aware planning
πŸ”Ž Similar Papers
No similar papers found.
Leyuan Fang
Leyuan Fang
Professor in College of Electrical and Information Engineering, Hunan University
Image ProcessingWeakly Supervised LearningOptical Coherence TomographyRemote Sensing Image Classification
Z
Zan Mao
School of Artificial Intelligence and Robotics, Hunan University, Changsha, China, 410114
Z
Zijing Wang
School of Artificial Intelligence and Robotics, Hunan University, Changsha, China, 410114
Y
Yinlong Yan
School of Artificial Intelligence and Robotics, Hunan University, Changsha, China, 410114