🤖 AI Summary
Existing urban foundation models suffer from significant geospatial bias, leading to imbalanced regional predictions and poor generalization—hindering Urban General Intelligence (UGI)’s cross-domain understanding and reasoning in complex urban environments. To address this, we propose Urban-R1, the first reinforcement learning (RL) post-training framework for urban multimodal large language models (MLLMs). Urban-R1 innovatively integrates Grouped Relative Policy Optimization (GRPO) with a city-region profiling auxiliary task, explicitly modeling geographic heterogeneity to mitigate spatial bias. Leveraging diverse multi-source urban data and multimodal LLMs, Urban-R1 substantially improves fairness and out-of-distribution generalization across cross-regional urban understanding, planning, and reasoning tasks. It surpasses both supervised fine-tuning baselines and leading proprietary models on multiple benchmarks. Urban-R1 establishes a novel paradigm for building robust, generalizable UGI systems capable of equitable, context-aware urban intelligence.
📝 Abstract
Rapid urbanization intensifies the demand for Urban General Intelligence (UGI), referring to AI systems that can understand and reason about complex urban environments. Recent studies have built urban foundation models using supervised fine-tuning (SFT) of LLMs and MLLMs, yet these models exhibit persistent geospatial bias, producing regionally skewed predictions and limited generalization. To this end, we propose Urban-R1, a reinforcement learning-based post-training framework that aligns MLLMs with the objectives of UGI. Urban-R1 adopts Group Relative Policy Optimization (GRPO) to optimize reasoning across geographic groups and employs urban region profiling as a proxy task to provide measurable rewards from multimodal urban data. Extensive experiments across diverse regions and tasks show that Urban-R1 effectively mitigates geo-bias and improves cross-region generalization, outperforming both SFT-trained and closed-source models. Our results highlight reinforcement learning alignment as a promising pathway toward equitable and trustworthy urban intelligence.