🤖 AI Summary
This paper addresses the geocoding challenge posed by compositional location references (e.g., “the Starbucks northwest of Zhongguancun Metro Station”). We propose a novel two-stage LLM-based approach: first decoupling spatial knowledge from logical reasoning capabilities, then jointly optimizing both via prompt engineering and supervised fine-tuning. Our key innovation is a knowledge-reasoning separation architecture, enabling lightweight fine-tuned models (e.g., 7B-parameter variants) to achieve accuracy on par with billion-parameter general-purpose LLMs on compositional referencing tasks. Experiments across multiple real-world datasets demonstrate an average 12.6% improvement in geocoding accuracy over baselines—including rule-based systems and end-to-end fine-tuned models—validating the efficacy and feasibility of domain-specialized small models for geographic semantic parsing.
📝 Abstract
Geocoding is the task of linking a location reference to an actual geographic location and is essential for many downstream analyses of unstructured text. In this paper, we explore the challenging setting of geocoding compositional location references. Building on recent work demonstrating LLMs' abilities to reason over geospatial data, we evaluate LLMs' geospatial knowledge versus reasoning skills relevant to our task. Based on these insights, we propose an LLM-based strategy for geocoding compositional location references. We show that our approach improves performance for the task and that a relatively small fine-tuned LLM can achieve comparable performance with much larger off-the-shelf models.