🤖 AI Summary
To address the low detection accuracy in diabetic retinopathy (DR) fundus images—caused by small lesion size, high lesion density, and challenging localization—this paper proposes RT-DETR-Med, a lightweight real-time Transformer-based detector. It is the first adaptation of RT-DETR to medical small-object detection, incorporating three key innovations: a cross-scale feature alignment module, a lesion-aware attention mechanism, and integration of deformable attention, dynamic label assignment, and medical-domain-specific preprocessing augmentation. Evaluated on standard fundus datasets, RT-DETR-Med achieves an mAP₅₀ of 89.3% and an mAP₅₀₋₉₅ of 76.5%, outperforming YOLOv8 and the original DETR by 6.2% and 9.8% in mAP₅₀, respectively, while improving small-object recall by 12.4%. The model maintains real-time inference capability without sacrificing accuracy, offering a robust technical foundation for early DR screening.
📝 Abstract
Deep learning has emerged as a transformative approach for solving complex pattern recognition and object detection challenges. This paper focuses on the application of a novel detection framework based on the RT-DETR model for analyzing intricate image data, particularly in areas such as diabetic retinopathy detection. Diabetic retinopathy, a leading cause of vision loss globally, requires accurate and efficient image analysis to identify early-stage lesions. The proposed RT-DETR model, built on a Transformer-based architecture, excels at processing high-dimensional and complex visual data with enhanced robustness and accuracy. Comparative evaluations with models such as YOLOv5, YOLOv8, SSD, and DETR demonstrate that RT-DETR achieves superior performance across precision, recall, mAP50, and mAP50-95 metrics, particularly in detecting small-scale objects and densely packed targets. This study underscores the potential of Transformer-based models like RT-DETR for advancing object detection tasks, offering promising applications in medical imaging and beyond.