🤖 AI Summary
Urban databases require continuous updates of semantic and geospatial information for infrastructure elements (e.g., traffic signs, trees, graffiti, road damage), yet conventional manual data collection is costly and inefficient. To address this, we propose a single-image-based geolocalization method for urban assets and events, leveraging monocular street-view imagery alone. Our approach integrates a metric depth estimation model, calibrated camera intrinsic and extrinsic parameters, geometric projection principles, and LiDAR point cloud–aided distance calibration to directly infer 3D object positions and project them into a geographic coordinate system. This work presents the first single-image-driven, city-scale semantic object georeferencing framework—requiring neither multi-view geometry nor image sequences. Experiments demonstrate controlled localization errors for traffic signs and road damage, alongside strong generalization across road and vegetation regions. The method significantly reduces human labor and temporal overhead in urban database maintenance.
📝 Abstract
To maintain an overview of urban conditions, city administrations manage databases of objects like traffic signs and trees, complete with their geocoordinates. Incidents such as graffiti or road damage are also relevant. As digitization increases, so does the need for more data and up-to-date databases, requiring significant manual effort. This paper introduces MapAnything, a module that automatically determines the geocoordinates of objects using individual images. Utilizing advanced Metric Depth Estimation models, MapAnything calculates geocoordinates based on the object's distance from the camera, geometric principles, and camera specifications. We detail and validate the module, providing recommendations for automating urban object and incident mapping. Our evaluation measures the accuracy of estimated distances against LiDAR point clouds in urban environments, analyzing performance across distance intervals and semantic areas like roads and vegetation. The module's effectiveness is demonstrated through practical use cases involving traffic signs and road damage.