🤖 AI Summary
This study addresses the challenge of GPS failure and map unavailability in high-density urban environments such as Chinese urban villages, where dense building structures cause severe signal occlusion. To overcome this limitation, the authors propose a vision-based geolocalization approach leveraging a low-cost dual-camera system that simultaneously captures 360° panoramic images and standard-view query images. They construct the first visual localization dataset specifically tailored to urban villages, collected in Shhipai Village, Guangzhou, and conduct a systematic evaluation of state-of-the-art image geolocalization models in this complex setting. The results demonstrate the feasibility of vision-based localization in densely built-up areas, delineate the performance boundaries of current methods, and offer technical foundations for applications including navigation for vulnerable populations, last-mile delivery, and emergency response.
📝 Abstract
Urban villages, the widespread informal settlements which have emerged as a result of rapid urbanization, are now major residential hubs for migrant workers in large cities in China. The dense arrangement of buildings in these areas often leads to unreliable GPS signals, while incomplete mapping data further impairs accurate route planning and navigation. These issues not only hinder everyday mobility but also pose significant challenges for emergency response, as confusing road layouts and GPS inaccuracies can complicate evacuation efforts. To address these challenges, we propose a practical vision-based geo-localization solution tailored for dense urban environments. Our approach features a low-cost data collection pipeline utilizing a dual-camera system, comprising a panoramic camera and a smartphone camera, to capture synchronized 360-degree panoramas and query images. Using Shipai Village, a well-known densely populated urban village in Guangzhou, as a case study, we develop a specialized image geo-localization dataset. We then assess and compare the performance of existing models across various scene types to identify their strengths and weaknesses. The findings demonstrate both the potential and limitations of visual-based localization in dense urban-village environments. Our framework aims to enhance pedestrian navigation, last-mile delivery, and emergency management in areas with poor GPS coverage, ultimately supporting the vulnerable populations living within these informal settlements.