🤖 AI Summary
Existing zero-shot object navigation (ZSON) methods exhibit insufficient robustness in unknown, cluttered, heavily occluded, and dynamic environments. This paper proposes a trajectory-conditioned 3D world model that jointly encodes egocentric visual observations and fuses multiple future predictions to enable cross-occlusion reasoning and dynamic target trajectory forecasting. Inspired by Schrödinger’s thought experiment, we introduce a probabilistic “ensemble of future worlds” to explicitly model environmental uncertainty—eliminating the need for global mapping or explicit collision-avoidance planning. Our approach integrates online value-map updating with end-to-end policy optimization and is validated on the Go2 quadrupedal robot. Experiments demonstrate significant improvements over state-of-the-art ZSON methods across three critical challenges: severe static occlusion, unknown hazards, and moving targets—achieving superior performance in self-localization accuracy, target localization success rate, and overall task completion rate.
📝 Abstract
Zero-shot object navigation (ZSON) requires a robot to locate a target object in a previously unseen environment without relying on pre-built maps or task-specific training. However, existing ZSON methods often struggle in realistic and cluttered environments, particularly when the scene contains heavy occlusions, unknown risks, or dynamically moving target objects. To address these challenges, we propose extbf{Schrödinger's Navigator}, a navigation framework inspired by Schrödinger's thought experiment on uncertainty. The framework treats unobserved space as a set of plausible future worlds and reasons over them before acting. Conditioned on egocentric visual inputs and three candidate trajectories, a trajectory-conditioned 3D world model imagines future observations along each path. This enables the agent to see beyond occlusions and anticipate risks in unseen regions without requiring extra detours or dense global mapping. The imagined 3D observations are fused into the navigation map and used to update a value map. These updates guide the policy toward trajectories that avoid occlusions, reduce exposure to uncertain space, and better track moving targets. Experiments on a Go2 quadruped robot across three challenging scenarios, including severe static occlusions, unknown risks, and dynamically moving targets, show that Schrödinger's Navigator consistently outperforms strong ZSON baselines in self-localization, object localization, and overall Success Rate in occlusion-heavy environments. These results demonstrate the effectiveness of trajectory-conditioned 3D imagination in enabling robust zero-shot object navigation.