🤖 AI Summary
To address poor visibility and domain shift in drone-based vehicle detection under Nordic snowy conditions, this paper proposes the sideload-CL-adaptation framework. It first performs self-supervised pretraining of a CNN feature extractor via contrastive learning on unlabeled snowy images to harness their latent discriminative information. This pretrained extractor is then sideloaded—i.e., integrated without weight updates—onto a frozen lightweight YOLOv11n backbone, and its features are fused with multi-granularity representations to enable effective knowledge transfer while preserving inference efficiency. Crucially, the detection head remains entirely unfine-tuned. Evaluated on the NVD dataset, the method achieves a 3.8-percentage-point improvement in mAP₅₀ (reaching 9.5%), demonstrating strong generalization under low-resource, high-interference conditions and validating its robustness for adverse-weather drone perception.
📝 Abstract
Aside from common challenges in remote sensing like small, sparse targets and computation cost limitations, detecting vehicles from UAV images in the Nordic regions faces strong visibility challenges and domain shifts caused by diverse levels of snow coverage. Although annotated data are expensive, unannotated data is cheaper to obtain by simply flying the drones. In this work, we proposed a sideload-CL-adaptation framework that enables the use of unannotated data to improve vehicle detection using lightweight models. Specifically, we propose to train a CNN-based representation extractor through contrastive learning on the unannotated data in the pretraining stage, and then sideload it to a frozen YOLO11n backbone in the fine-tuning stage. To find a robust sideload-CL-adaptation, we conducted extensive experiments to compare various fusion methods and granularity. Our proposed sideload-CL-adaptation model improves the detection performance by 3.8% to 9.5% in terms of mAP50 on the NVD dataset.