🤖 AI Summary
This work challenges the overattribution of planning capability to large language models (LLMs) in ObjectGoal Navigation, questioning whether performance gains stem from geometric priors rather than genuine language reasoning. To isolate these factors, we propose a Distance-Weighted Frontier Explorer (DWFE) and a lightweight Semantic Heuristic Filter (SHF), and conduct training-free navigation experiments on the HM3D-v1 validation set. Results show that leveraging frontier geometry alone improves success rate from 58.0% to 61.1% and SPL to 36.0%; incorporating SHF further reduces average path length by five steps. Our analysis reveals that geometry-informed heuristics—not LLM-based semantic reasoning—are the primary driver of performance gains. This finding contradicts prevailing assumptions about LLM-driven navigational intelligence and underscores the necessity of integrating metric-aware prompting or offline semantic maps to achieve truly semantic navigation.
📝 Abstract
Large language models (LLMs) are often credited with recent leaps in ObjectGoal Navigation, yet the extent to which they improve planning remains unclear. We revisit this question on the HM3D-v1 validation split. First, we strip InstructNav of its Dynamic Chain-of-Navigation prompt, open-vocabulary GLEE detector and Intuition saliency map, and replace them with a simple Distance-Weighted Frontier Explorer (DWFE). This geometry-only heuristic raises Success from 58.0% to 61.1% and lifts SPL from 20.9% to 36.0% over 2 000 validation episodes, outperforming all previous training-free baselines. Second, we add a lightweight language prior (SHF); on a 200-episode subset this yields a further +2% Success and +0.9% SPL while shortening paths by five steps on average. Qualitative trajectories confirm the trend: InstructNav back-tracks and times-out, DWFE reaches the goal after a few islands, and SHF follows an almost straight route. Our results indicate that frontier geometry, not emergent LLM reasoning, drives most reported gains, and suggest that metric-aware prompts or offline semantic graphs are necessary before attributing navigation success to "LLM intelligence."