LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited reasoning capability of large language models (LLMs) in real-world complex decision-making tasks—exemplified by site selection—by introducing the first benchmark tailored to realistic, multi-constrained location-allocation scenarios. The benchmark comprises 300+ hierarchical, heterogeneous constraints spanning spatial, environmental, and logistical domains, and integrates a constraint-driven sandbox environment with a custom toolchain to enable constraint-aware search. Methodologically, it combines agent-based strategies (e.g., ReAct, Reflexion) with direct code-generation prompting. Experiments reveal that state-of-the-art reasoning models (e.g., o1, o4) exhibit a 30% failure rate; moreover, most agents suffer performance degradation due to excessive, ungrounded reasoning. This study is the first to systematically expose LLMs’ fundamental bottlenecks in nonlinear, globally coupled real-world decision-making, establishing a novel paradigm and a reproducible benchmark for evaluating and advancing embodied spatial reasoning.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-training, have demonstrated impressive reasoning capabilities, as exemplified by models such as OpenAI o1 and DeepSeek-R1. However, these capabilities are predominantly benchmarked on domains like mathematical problem solving and code generation -- leaving open the question of whether such reasoning skills generalize to complex, real-world scenarios. In this paper, we introduce LocationReasoner, a benchmark designed to evaluate LLMs' reasoning abilities in the context of real-world site selection, where models must identify feasible locations by reasoning over diverse and complicated spatial, environmental, and logistical constraints. The benchmark comprises over 300 carefully crafted queries of varying difficulty levels, supported by a sandbox environment with in-house tools for constraint-based location search. Extensive evaluations reveal that state-of-the-art reasoning models offer limited improvement over their non-reasoning predecessors in real-world contexts, with even the latest OpenAI o4 model failing on 30% of site selection tasks. Moreover, agentic strategies such as ReAct and Reflexion often suffer from over-reasoning, leading to worse outcomes than direct code-generation prompting. With key limitations of LLMs in holistic and non-linear reasoning highlighted, we release LocationReasoner to foster the development of LLMs and agents capable of robust, grounded reasoning in real-world decision-making tasks. Codes and data for our benchmark are available at https://github.com/miho-koda/LocationReasoner.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' reasoning in real-world site selection scenarios

Assessing LLMs' ability to handle complex spatial and logistical constraints

Identifying limitations of current models in holistic, non-linear reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

LocationReasoner benchmark for real-world site selection

Sandbox environment with constraint-based location tools

Evaluates LLMs on holistic non-linear reasoning tasks

🔎 Similar Papers

No similar papers found.

Authors to Follow