🤖 AI Summary
This study addresses the performance degradation of Advanced Driver Assistance Systems (ADAS) when deployed across countries due to differences in traffic signs, regulations, and visual norms, while noting that conventional on-site data collection is costly and inefficient. To overcome this, the authors propose leveraging publicly available street-view imagery to guide data acquisition in target countries. They introduce two point-of-interest (POI) scoring strategies—KNN feature distance based on vision foundation models and visual attribution from vision-language models—to identify representative locations. A collect-and-detect protocol is then employed to construct a paired dataset of street-view and in-vehicle images. This work presents the first use of street-view imagery for cross-domain ADAS data collection, demonstrating that with only half the target-domain data, traffic sign detection performance matches that of random sampling, thereby validating the feasibility of scalable, low-cost domain adaptation at a national level.
📝 Abstract
Deploying ADAS and ADS across countries remains challenging due to differences in legislation, traffic infrastructure, and visual conventions, which introduce domain shifts that degrade perception performance. Traditional cross-country data collection relies on extensive on-road driving, making it costly and inefficient to identify representative locations. To address this, we propose a street-view-guided data acquisition strategy that leverages publicly available imagery to identify places of interest (POI). Two POI scoring methods are introduced: a KNN-based feature distance approach using a vision foundation model, and a visual-attribution approach using a vision-language model. To enable repeatable evaluation, we adopt a collect-detect protocol and construct a co-located dataset by pairing the Zenseact Open Dataset with Mapillary street-view images. Experiments on traffic sign detection, a task particularly sensitive to cross-country variations in sign appearance, show that our approach achieves performance comparable to random sampling while using only half of the target-domain data. We further provide cost estimations for full-country analysis, demonstrating that large-scale street-view processing remains economically feasible. These results highlight the potential of street-view-guided data acquisition for efficient and cost-effective cross-country model adaptation.