Real-Time Oriented Object Detection Transformer in Remote Sensing Images

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing real-time detectors struggle to accurately model the rotational characteristics of arbitrarily oriented objects in remote sensing images, leading to angular representation bias, suboptimal matching costs, and unstable training. To address these challenges, this work proposes RTOD-DETR, the first end-to-end real-time oriented object detection Transformer. It introduces three key innovations: iterative refinement based on angular distribution, bipartite matching using Chamfer distance over vertex sets, and oriented contrastive denoising. These mechanisms collectively enhance angular accuracy and training stability. On the DOTA1.0 benchmark, RTOD-DETR achieves an AP50 of 77.73%–80.15% while running at 119–132 FPS on an NVIDIA GeForce RTX 2080 Ti GPU, demonstrating an exceptional balance between high detection accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract
Recent real-time detection transformers have gained popularity due to their simplicity and efficiency. However, these detectors do not explicitly model object rotation, especially in remote sensing imagery where objects appear at arbitrary angles, leading to challenges in angle representation, matching cost, and training stability. In this paper, we propose a real-time oriented object detection transformer, the first real-time end-to-end oriented object detector to the best of our knowledge, that addresses the above issues. Specifically, angle distribution refinement is proposed to reformulate angle regression as an iterative refinement of probability distributions, thereby capturing the uncertainty of object rotation and providing a more fine-grained angle representation. Then, we incorporate a Chamfer distance cost into bipartite matching, measuring box distance via vertex sets, enabling more accurate geometric alignment and eliminating ambiguous matches. Moreover, we propose oriented contrastive denoising to stabilize training and analyze four noise modes. We observe that a ground truth can be assigned to different index queries across different decoder layers, and analyze this issue using the proposed instability metric. We design a series of model variants and experiments to validate the proposed method. Notably, our O2-DFINE-L, O2-RTDETR-R50 and O2-DEIM-R50 achieve 77.73%/78.45%/80.15% AP50 on DOTA1.0 and 132/119/119 FPS on the 2080ti GPU. Code is available at https://github.com/wokaikaixinxin/ai4rs.
Problem

Research questions and friction points this paper is trying to address.

oriented object detection
remote sensing images
real-time detection
angle representation
training stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

oriented object detection
real-time transformer
angle distribution refinement
Chamfer distance matching
contrastive denoising
🔎 Similar Papers
No similar papers found.
Zeyu Ding
Zeyu Ding
Binghamton University
privacysecuritymachine learningfairnessformal verification
Y
Yong Zhou
School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, Xuzhou 221116, China; Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou 221116, China; Jiangsu Provincial Industrial Technology Engineering Center for Intelligent Sensing and Emergency IoT in Underground Space, Xuzhou 221116, China
Jiaqi Zhao
Jiaqi Zhao
Xidian University
privacy-preserving machine learning
W
Wen-Liang Du
School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, Xuzhou 221116, China; Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou 221116, China; Jiangsu Provincial Industrial Technology Engineering Center for Intelligent Sensing and Emergency IoT in Underground Space, Xuzhou 221116, China
X
Xixi Li
School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, Xuzhou 221116, China; Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou 221116, China; Jiangsu Provincial Industrial Technology Engineering Center for Intelligent Sensing and Emergency IoT in Underground Space, Xuzhou 221116, China
Rui Yao
Rui Yao
China University of Mining and Technology
Computer VisionMachine Learning
Abdulmotaleb El Saddik
Abdulmotaleb El Saddik
MCRLab, University of Ottawa
Immersive MediaDigital TwinsHuman Centered AIMultimedia CommunicationMetaverse