🤖 AI Summary
To address data scarcity and poor generalization in few-shot cross-national traffic sign detection, this paper proposes a few-shot domain adaptation method built upon Faster R-CNN. The approach tackles the problem by jointly optimizing all network parameters—unlike prior few-shot object detection (FSOD) methods that freeze backbone layers—thereby enhancing feature transferability. It introduces a pseudo-support set generation mechanism that leverages GANs and geometric transformations to augment target-domain samples. Additionally, L2 embedding normalization is adopted to suppress intra-class variation, while multi-source pretraining across diverse national traffic sign datasets is combined with cross-domain adaptation. On the BDTSD benchmark, the method achieves mAP improvements of 2.4×, 2.2×, 1.5×, and 1.3× over the state-of-the-art for 1-, 3-, 5-, and 10-shot settings, respectively, and attains superior performance across multiple cross-domain FSOD benchmarks.
📝 Abstract
Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In this connection, we present 'FUSED-Net', built-upon Faster RCNN for traffic sign detection, enhanced by Unfrozen Parameters, Pseudo-Support Sets, Embedding Normalization, and Domain Adaptation while reducing data requirement. Unlike traditional approaches, we keep all parameters unfrozen during training, enabling FUSED-Net to learn from limited samples. The generation of a Pseudo-Support Set through data augmentation further enhances performance by compensating for the scarcity of target domain data. Additionally, Embedding Normalization is incorporated to reduce intra-class variance, standardizing feature representation. Domain Adaptation, achieved by pre-training on a diverse traffic sign dataset distinct from the target domain, improves model generalization. Evaluating FUSED-Net on the BDTSD dataset, we achieved 2.4x, 2.2x, 1.5x, and 1.3x improvements of mAP in 1-shot, 3-shot, 5-shot, and 10-shot scenarios, respectively compared to the state-of-the-art Few-Shot Object Detection (FSOD) models. Additionally, we outperform state-of-the-art works on the cross-domain FSOD benchmark under several scenarios.