🤖 AI Summary
This work addresses the limitation of existing foundation models in urban socioeconomic prediction, which predominantly rely on static place attributes while neglecting human mobility patterns that reflect functional interconnections among areas. To bridge this gap, the authors propose MobFusion, a novel framework that systematically integrates urban mobility networks into foundation models through three complementary mechanisms: enhancing zero-shot prompting with mobility-informed context, fusing multimodal geographic and textual embeddings via a graph-based connector, and incorporating mobility structures as learnable tokens to support inference. Leveraging large-scale anonymized mobility data, large language models, and graph neural networks, MobFusion demonstrates significant performance gains in predicting key urban indicators—including household income, population density, and crime rates—across three major U.S. metropolitan areas.
📝 Abstract
Foundation models have recently been applied to urban socioeconomic prediction using POI text, satellite imagery, and geospatial descriptions. However, these models mostly rely on static attributes of individual places, while ignoring the mobility patterns that reveal how places are functionally connected. To address this gap, we explore whether mobility networks can elicit the geospatial capabilities of foundation models by explicitly encoding connectivity among urban entities. We propose \textit{MobFusion}, a modular mobility-enhanced foundation model fusion paradigm, and instantiate it through three complementary designs: (i) mobility networks as contexts for zero-shot LLM prompting, (ii) as graph connectors for fusing geospatial visual embeddings with textual embeddings, and (iii) as structured tokens for multimodal LLM reasoning. Using anonymized large-scale mobility datasets from three U.S. metropolitan areas, we find that \textit{MobFusion} improves urban prediction tasks (e.g., median household income, population density, and crime prediction) across three instantiations, demonstrating that incorporating human mobility can effectively improve the socioeconomic understanding of foundation models.