🤖 AI Summary
This work addresses the challenge of generating dense, traversable viewpoint sequences for image-goal navigation under sparse image databases. We propose an end-to-end visual navigation method that requires neither metric localization nor environment-specific training. Our key innovation is the first integration of 3D Gaussian Splatting (3DGS) into visual navigation models (VNMs), enabling neural rendering–based viewpoint interpolation and path-guided synthesis to automatically construct continuous, traversable viewpoint sequences from start to goal. By bypassing reliance on SLAM or explicit geometric reconstruction, our approach significantly reduces storage overhead—requiring only sparse input images. Evaluated in photorealistic simulation environments, it achieves substantial improvements in navigation success rate and path efficiency, while demonstrating strong robustness to varying degrees of image sparsity.
📝 Abstract
This paper presents a novel approach to image-goal navigation by integrating 3D Gaussian Splatting (3DGS) with Visual Navigation Models (VNMs), a method we refer to as GSplatVNM. VNMs offer a promising paradigm for image-goal navigation by guiding a robot through a sequence of point-of-view images without requiring metrical localization or environment-specific training. However, constructing a dense and traversable sequence of target viewpoints from start to goal remains a central challenge, particularly when the available image database is sparse. To address these challenges, we propose a 3DGS-based viewpoint synthesis framework for VNMs that synthesizes intermediate viewpoints to seamlessly bridge gaps in sparse data while significantly reducing storage overhead. Experimental results in a photorealistic simulator demonstrate that our approach not only enhances navigation efficiency but also exhibits robustness under varying levels of image database sparsity.