🤖 AI Summary
Existing real-time detectors struggle to accurately model the rotational characteristics of arbitrarily oriented objects in remote sensing images, leading to angular representation bias, suboptimal matching costs, and unstable training. To address these challenges, this work proposes RTOD-DETR, the first end-to-end real-time oriented object detection Transformer. It introduces three key innovations: iterative refinement based on angular distribution, bipartite matching using Chamfer distance over vertex sets, and oriented contrastive denoising. These mechanisms collectively enhance angular accuracy and training stability. On the DOTA1.0 benchmark, RTOD-DETR achieves an AP50 of 77.73%–80.15% while running at 119–132 FPS on an NVIDIA GeForce RTX 2080 Ti GPU, demonstrating an exceptional balance between high detection accuracy and computational efficiency.
📝 Abstract
Recent real-time detection transformers have gained popularity due to their simplicity and efficiency. However, these detectors do not explicitly model object rotation, especially in remote sensing imagery where objects appear at arbitrary angles, leading to challenges in angle representation, matching cost, and training stability. In this paper, we propose a real-time oriented object detection transformer, the first real-time end-to-end oriented object detector to the best of our knowledge, that addresses the above issues. Specifically, angle distribution refinement is proposed to reformulate angle regression as an iterative refinement of probability distributions, thereby capturing the uncertainty of object rotation and providing a more fine-grained angle representation. Then, we incorporate a Chamfer distance cost into bipartite matching, measuring box distance via vertex sets, enabling more accurate geometric alignment and eliminating ambiguous matches. Moreover, we propose oriented contrastive denoising to stabilize training and analyze four noise modes. We observe that a ground truth can be assigned to different index queries across different decoder layers, and analyze this issue using the proposed instability metric. We design a series of model variants and experiments to validate the proposed method. Notably, our O2-DFINE-L, O2-RTDETR-R50 and O2-DEIM-R50 achieve 77.73%/78.45%/80.15% AP50 on DOTA1.0 and 132/119/119 FPS on the 2080ti GPU. Code is available at https://github.com/wokaikaixinxin/ai4rs.