🤖 AI Summary
This study investigates how linguistic structure and contextual features in textual spatial representations influence navigation planning in large language models (LLMs), challenging the assumption that such representations are neutral engineering choices. The authors propose a dual-intervention framework that disentangles linguistic structure from contextual factors for the first time: representational intervention manipulates language format and compression level, while contextual intervention probes the effects of compositional and conflicting contextual cues. Systematic experiments across multiple model scales and diverse spatial reasoning tasks reveal that topological information is fundamental to robust planning, the efficacy of language formats is contingent on model scale and task type, and semantic errors systematically degrade planning performance. These findings establish core design principles for textual spatial representations: preserve topology, control compression, and ensure semantic fidelity.
📝 Abstract
Large Language Model (LLM)-based navigation systems commonly construct explicit spatial representations (e.g., topological graphs, semantic raster maps) and translate them into textual descriptions as LLMs' inputs. However, the linguistic structures of such text-based spatial representations and the choices of contextual features (e.g., topology, geometry) they contain are often treated as neutral engineering decisions rather than key factors that shape LLMs' behavior. To fill the gap, we propose a dual-interventional framework that disentangles linguistic structures from different contextual cues to evaluate the linguistic inductive bias of LLMs for navigation planning. In the framework, representation intervention varies the linguistic format and the degree of linguistic compression, clarifying when linguistic representations support or inhibit navigation planning. Context intervention, combined with contextual feature combination and conflict probing, explicitly clarifies the preferences and weaknesses of LLMs when processing different contextual cues. Experiments across diverse spatial reasoning tasks and multiple model scales reveal a consistent pattern: topological information is a sturdy shield and the backbone of robust planning; linguistic format is a double-edged sword whose effect depends on model size, task demands, and the compression level; and semantic information is a fatal Achilles' heel -- incorrect semantic cues can systematically derail the planning process. Overall, our study shows that effective text-based spatial representations in LLM-based navigation should preserve topological integrity, calibrate representational compression to model capacity, and ensure semantic correctness, rather than simply adopting a single representation. Our code is publicly available at https://github.com/jonesdong150/LLM-Navigation-Inductive-Bias.