OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs

📅 2026-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a method to construct universal global location representations using only OpenStreetMap (OSM) data, enabling diverse geospatial applications without reliance on remote sensing imagery. By modeling geographic environments as heterogeneous graphs incorporating roads, buildings, land use, and points of interest, the approach integrates a multi-scale graph encoder with spherical harmonic positional encoding. Notably, it introduces a CLIP-style contrastive learning framework to OSM graph data for the first time, leveraging solely map topology and semantic information to generate high-quality location embeddings. Evaluated across seven downstream tasks spanning climate, ecology, socioeconomic, and public health domains, the method achieves strong performance—significantly outperforming satellite-image baselines in socioeconomic and public health tasks—demonstrating OSM’s remarkable capacity to encode human activity patterns.
📝 Abstract
We present OSMGraphCLIP, a CLIP-style geospatial representation model that learns global location embeddings from freely available OpenStreetMap (OSM) data. OSMGraphCLIP represents geographic environments as heterogeneous graphs of typed OSM features, preserving the topological and semantic relationships among roads, buildings, land-use regions, and points of interest. A multi-scale graph encoder captures both fine-grained local structure and broader landscape composition, and supervises a spherical-harmonics location encoder through a contrastive alignment objective. We evaluate OSMGraphCLIP across a diverse suite of downstream geospatial regression and classification tasks spanning climate, ecology, socioeconomic indicators, public health, land cover, biodiversity, and wildfire forecasting, and show that structured OSM data alone supports strong global location representations across domains. OSMGraphCLIP matches or exceeds satellite-based baselines on the majority of benchmarks, with the most pronounced advantage on socioeconomic and public-health tasks, where OSM's explicit semantic annotation of the built environment encodes patterns of human activity that satellite pixels can only capture indirectly. On ecological and environmental tasks, the model remains closely competitive with imagery-based methods despite using no Earth observation data. Qualitative analysis confirms that the learned embeddings organize geographic space coherently, recovering biome boundaries, urban gradients, and tropical--temperate distinctions from map topology alone.
Problem

Research questions and friction points this paper is trying to address.

geospatial representation
OpenStreetMap
location embedding
heterogeneous graph
global location
Innovation

Methods, ideas, or system contributions that make the work stand out.

OSMGraphCLIP
heterogeneous graph
contrastive learning
geospatial representation
multi-scale graph encoder