🤖 AI Summary
This work addresses the limited capability of vision-language models in allocentric spatial reasoning—specifically, their difficulty in inferring spatial relationships from an object-centered perspective. To overcome this, the authors propose the Symbolic Projection Layout (SymPL) framework, which reformulates allocentric spatial reasoning as a symbolic layout problem. SymPL introduces a structured representation through four key operations: projection, abstraction, bisection, and localization, aligning the spatial reasoning process with the inherent mechanisms of vision-language models. Experimental results demonstrate that SymPL substantially improves performance on both allocentric and egocentric spatial reasoning tasks, while also enhancing model robustness in scenarios involving visual illusions and multi-view perspectives. These findings underscore the efficacy of symbolic geometric representations for complex spatial understanding.
📝 Abstract
Perspective-aware spatial reasoning involves understanding spatial relationships from specific viewpoints-either egocentric (observer-centered) or allocentric (object-centered). While vision-language models (VLMs) perform well in egocentric settings, their performance deteriorates when reasoning from allocentric viewpoints, where spatial relations must be inferred from the perspective of objects within the scene. In this study, we address this underexplored challenge by introducing Symbolic Projective Layout (SymPL), a framework that reformulates allocentric reasoning into symbolic-layout forms that VLMs inherently handle well. By leveraging four key factors-projection, abstraction, bipartition, and localization-SymPL converts allocentric questions into structured symbolic-layout representations. Extensive experiments demonstrate that this reformulation substantially improves performance in both allocentric and egocentric tasks, enhances robustness under visual illusions and multi-view scenarios, and that each component contributes critically to these gains. These results show that SymPL provides an effective and principled approach for addressing complex perspective-aware spatial reasoning.