Urban In-Context Learning: Bridging Pretraining and Inference through Masked Diffusion for Urban Profiling

πŸ“… 2025-08-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the weak generalization and misalignment between pretraining and inference in two-stage urban profiling paradigms (representation learning followed by linear probing), this paper proposes UrbanMDTβ€”the first unified, single-stage framework that jointly optimizes pretraining and inference. Its core innovation is the introduction of masked diffusion modeling into urban representation learning, realized via the Urban Masked Diffusion Transformer. This architecture integrates representation alignment with intermediate-feature regularization, incorporating classical urban analytical priors to enable end-to-end contextual learning over non-linguistic, structured urban data. UrbanMDT achieves significant improvements over state-of-the-art two-stage methods across two cities and three fine-grained profiling metrics. Ablation studies confirm that the masked diffusion mechanism is particularly critical for enhancing distributional prediction performance.

Technology Category

Application Category

πŸ“ Abstract
Urban profiling aims to predict urban profiles in unknown regions and plays a critical role in economic and social censuses. Existing approaches typically follow a two-stage paradigm: first, learning representations of urban areas; second, performing downstream prediction via linear probing, which originates from the BERT era. Inspired by the development of GPT style models, recent studies have shown that novel self-supervised pretraining schemes can endow models with direct applicability to downstream tasks, thereby eliminating the need for task-specific fine-tuning. This is largely because GPT unifies the form of pretraining and inference through next-token prediction. However, urban data exhibit structural characteristics that differ fundamentally from language, making it challenging to design a one-stage model that unifies both pretraining and inference. In this work, we propose Urban In-Context Learning, a framework that unifies pretraining and inference via a masked autoencoding process over urban regions. To capture the distribution of urban profiles, we introduce the Urban Masked Diffusion Transformer, which enables each region' s prediction to be represented as a distribution rather than a deterministic value. Furthermore, to stabilize diffusion training, we propose the Urban Representation Alignment Mechanism, which regularizes the model's intermediate features by aligning them with those from classical urban profiling methods. Extensive experiments on three indicators across two cities demonstrate that our one-stage method consistently outperforms state-of-the-art two-stage approaches. Ablation studies and case studies further validate the effectiveness of each proposed module, particularly the use of diffusion modeling.
Problem

Research questions and friction points this paper is trying to address.

Predict urban profiles in unknown regions accurately
Unify pretraining and inference for urban data
Stabilize diffusion training for urban profiling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Urban In-Context Learning unifies pretraining and inference
Urban Masked Diffusion Transformer predicts profile distributions
Urban Representation Alignment Mechanism stabilizes diffusion training
πŸ”Ž Similar Papers
No similar papers found.
R
Ruixing Zhang
the State Key Laboratory of Complex and Critical Software Environment, Beihang University
B
Bo Wang
the State Key Laboratory of Complex and Critical Software Environment, Beihang University
T
Tongyu Zhu
the State Key Laboratory of Complex and Critical Software Environment, Beihang University
Leilei Sun
Leilei Sun
Beihang University
Data MiningMachine LearningGraph Learning
W
Weifeng Lv
the State Key Laboratory of Complex and Critical Software Environment, Beihang University