🤖 AI Summary
In remote sensing object detection, fine-tuning ImageNet-pretrained backbones often degrades low-level visual features critical for aerial imagery. To address this, we propose Dynamic Backbone Freezing (DBF), a method that adaptively modulates layer-wise update intensity during training via a learnable freezing scheduler—thereby balancing preservation of generic low-level features with acquisition of remote sensing–specific representations. DBF requires no architectural modifications, introduces zero additional parameters, and is plug-and-play. On DOTA and DIOR-R, DBF consistently improves detection accuracy (mAP gains of 2.1–3.4 points) while reducing training computational cost (FLOPs reduced by 18–25%). Its core innovation lies in formulating backbone layer updates as a continuous, hierarchical, and task-driven dynamic process—the first such formulation—establishing a new paradigm for long-horizon training of remote sensing vision models.
📝 Abstract
Recently, numerous methods have achieved impressive performance in remote sensing object detection, relying on convolution or transformer architectures. Such detectors typically have a feature backbone to extract useful features from raw input images. For the remote sensing domain, a common practice among current detectors is to initialize the backbone with pre-training on ImageNet consisting of natural scenes. Fine-tuning the backbone is then typically required to generate features suitable for remote-sensing images. However, this could hinder the extraction of basic visual features in long-term training, thus restricting performance improvement. To mitigate this issue, we propose a novel method named DBF (Dynamic Backbone Freezing) for feature backbone fine-tuning on remote sensing object detection. Our method aims to handle the dilemma of whether the backbone should extract low-level generic features or possess specific knowledge of the remote sensing domain, by introducing a module called 'Freezing Scheduler' to dynamically manage the update of backbone features during training. Extensive experiments on DOTA and DIOR-R show that our approach enables more accurate model learning while substantially reducing computational costs. Our method can be seamlessly adopted without additional effort due to its straightforward design.