🤖 AI Summary
This study addresses the critical question of whether newer YOLO versions inherently outperform predecessors across diverse real-world domains. Method: We conduct a systematic, fair evaluation of YOLOv1–v11 on ODVerse33—a newly constructed, multi-domain benchmark comprising 33 cross-domain datasets spanning 11 application areas (e.g., autonomous driving, agriculture, medical imaging). Evaluation ensures comparability via standardized training protocols, unified preprocessing, and identical hardware configurations. Contribution/Results: Contrary to common assumptions, newer versions are not universally superior: YOLOv8 and YOLOv9 outperform YOLOv10 and YOLOv11 in several scenarios. Crucially, architectural lightweighting and training strategies exert greater influence on cross-domain generalization than parameter count increases. Our key contribution is uncovering the nonlinear relationship between architectural evolution and practical detection efficacy, establishing ODVerse33 and a version-level attribution framework to enable reproducible, interpretable, and domain-agnostic evaluation for object detection.
📝 Abstract
You Look Only Once (YOLO) models have been widely used for building real-time object detectors across various domains. With the increasing frequency of new YOLO versions being released, key questions arise. Are the newer versions always better than their previous versions? What are the core innovations in each YOLO version and how do these changes translate into real-world performance gains? In this paper, we summarize the key innovations from YOLOv1 to YOLOv11, introduce a comprehensive benchmark called ODverse33, which includes 33 datasets spanning 11 diverse domains (Autonomous driving, Agricultural, Underwater, Medical, Videogame, Industrial, Aerial, Wildlife, Retail, Microscopic, and Security), and explore the practical impact of model improvements in real-world, multi-domain applications through extensive experimental results. We hope this study can provide some guidance to the extensive users of object detection models and give some references for future real-time object detector development.