π€ AI Summary
To address the low accuracy and poor real-time performance of small-object detection in UAV imagery, this paper proposes a single-stage high-precision detection framework. First, we design a Scale-Invariant Feature Decoupling (SIFD) module that explicitly separates scale-dependent and scale-invariant features. Second, we introduce an adversarial feature learning mechanism to enhance the robustness of feature decoupling. Third, we construct State-Airβthe first multimodal UAV dataset incorporating flight-control state parameters. Our method is built upon lightweight single-stage detectors (YOLOv5/v8/PP-YOLOE) and optimized end-to-end. It achieves state-of-the-art (SOTA) performance on both public and in-house datasets, with significant improvements in small-object AP while maintaining real-time inference speed (>30 FPS). The source code and the State-Air dataset will be publicly released.
π Abstract
Detecting objects from Unmanned Aerial Vehicles (UAV) is often hindered by a large number of small objects, resulting in low detection accuracy. To address this issue, mainstream approaches typically utilize multi-stage inferences. Despite their remarkable detecting accuracies, real-time efficiency is sacrificed, making them less practical to handle real applications. To this end, we propose to improve the single-stage inference accuracy through learning scale-invariant features. Specifically, a Scale-Invariant Feature Disentangling module is designed to disentangle scale-related and scale-invariant features. Then an Adversarial Feature Learning scheme is employed to enhance disentanglement. Finally, scale-invariant features are leveraged for robust UAV-based object detection. Furthermore, we construct a multi-modal UAV object detection dataset, State-Air, which incorporates annotated UAV state parameters. We apply our approach to three lightweight detection frameworks on two benchmark datasets. Extensive experiments demonstrate that our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets. Our code and dataset will be publicly available once the paper is accepted.