TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Infrared small target detection (IRSTD) faces challenges including sparse feature representation, strong background clutter, high false-negative rates, and excessive computational overhead. To address these, this paper proposes an optimized YOLOv12n architecture featuring: (1) a stride-aware backbone to enhance multi-scale feature extraction; (2) a high-resolution detection head integrated with cascaded coordinate attention modules to improve small-target saliency modeling; (3) a lightweight branch pruning strategy to eliminate redundant computation; and (4) normalized Gaussian Wasserstein distance (NWD) loss for refined localization accuracy. Evaluated on four benchmark IRSTD datasets, the method achieves a 7.9% gain in mAP@0.5, with Precision and Recall improvements of 3.0% and 10.2%, respectively. It operates at 123 FPS on a single GPU while reducing computational cost by 25.5%, demonstrating superior cross-dataset generalization over state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Infrared small target detection (IRSTD) is critical for defense and surveillance but remains challenging due to (1) target loss from minimal features, (2) false alarms in cluttered environments, (3) missed detections from low saliency, and (4) high computational costs. To address these issues, we propose TY-RIST, an optimized YOLOv12n architecture that integrates (1) a stride-aware backbone with fine-grained receptive fields, (2) a high-resolution detection head, (3) cascaded coordinate attention blocks, and (4) a branch pruning strategy that reduces computational cost by about 25.5% while marginally improving accuracy and enabling real-time inference. We also incorporate the Normalized Gaussian Wasserstein Distance (NWD) to enhance regression stability. Extensive experiments on four benchmarks and across 20 different models demonstrate state-of-the-art performance, improving mAP at 0.5 IoU by +7.9%, Precision by +3%, and Recall by +10.2%, while achieving up to 123 FPS on a single GPU. Cross-dataset validation on a fifth dataset further confirms strong generalization capability. Additional results and resources are available at https://www.github.com/moured/TY-RIST
Problem

Research questions and friction points this paper is trying to address.

Detecting infrared small targets with minimal features
Reducing false alarms in cluttered infrared environments
Achieving real-time detection with low computational costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized YOLOv12n with stride-aware backbone
High-resolution head with cascaded attention blocks
Branch pruning strategy reducing computational cost
🔎 Similar Papers
No similar papers found.
A
Abdulkarim Atrash
Middle East Technical University
Omar Moured
Omar Moured
Karlsruhe Institue of Technology
Computer VisionVision-Language ModelsDocument AnalysisAssistive Tech
Y
Yufan Chen
Karlsruhe Institute of Technology
J
Jiaming Zhang
Karlsruhe Institute of Technology
Seyda Ertekin
Seyda Ertekin
Associate Professor @ ODTU
Machine learningArtificial intelligenceData Science
O
Omur Ugur
Middle East Technical University