🤖 AI Summary
This work addresses the challenge of distance estimation for small unmanned aerial vehicles in long-range monocular imagery, where extreme scale variation, cluttered backgrounds, and visual noise severely degrade performance. To tackle this problem, the authors propose DroneDAR, a novel model that robustly fuses geometric features derived from object detection bounding boxes with visual appearance cues extracted by a convolutional backbone through a lightweight gating mechanism—enabling accurate ranging even for extremely small targets. The approach integrates multiple regression losses and includes a systematic analysis of the impact of cropping resolution and network capacity. Experimental results demonstrate that DroneDAR significantly improves ranging accuracy in long-range scenarios, while also uncovering critical failure modes related to bounding box noise sensitivity and texture deficiency, offering valuable insights for real-world deployment.
📝 Abstract
Accurate distance estimation for small drones in long-range imagery is important for tracking and situational awareness, yet remains challenging due to extreme target scale variation, background clutter, and noisy visual cues. This paper studies monocular drone distance estimation using image crops together with bounding-box geometry, a practical setting in which a detector provides a candidate drone region and the model predicts range from appearance and box-derived features. We evaluate a Droneranger-style baseline, and introduce a new DroneDAR (Drone Detection And Ranging) model that combines a convolutional backbone with explicit bounding-box cues through a lightweight gating mechanism. Experiments analyze how backbone capacity, crop resolution, and regression loss functions affect performance across distance regimes. We further examine common failure modes at long distances, including sensitivity to bounding-box noise and reduced texture detail in the crop. The results provide guidance for designing and training range estimators that remain robust under real-world long-range conditions and highlight directions for improving reliability when drones occupy only a few pixels.