LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection

📅 2024-06-20

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the low FLOP efficiency and suboptimal accuracy–computation trade-off of YOLO-style models on embedded devices, this paper proposes LeYOLO—a FLOP-aware, scalable lightweight YOLO architecture. Methodologically, it introduces three key innovations: (1) an inverted-bottleneck backbone scaled under information bottleneck theory guidance; (2) a Fast Pyramid Feature Network (FPAN) for efficient multi-scale feature fusion; and (3) a decoupled Network-in-Network (DNiN) detection head. Leveraging joint scaling under strict FLOP constraints, LeYOLO achieves state-of-the-art FLOP–accuracy efficiency across a wide operating range. Specifically, LeYOLO-Small attains 38.2% mAP on COCO at just 4.5 GFLOPs—reducing computational cost by 42% versus YOLOv9-Tiny while maintaining superior accuracy. The full LeYOLO family spans 0.66–8.4 GFLOPs with corresponding mAPs of 25.2–41.0, establishing new SOTA in FLOP-per-mAP performance.

Technology Category

Application Category

📝 Abstract

Computational efficiency in deep neural networks is critical for object detection, especially as newer models prioritize speed over efficient computation (FLOP). This evolution has somewhat left behind embedded and mobile-oriented AI object detection applications. In this paper, we focus on design choices of neural network architectures for efficient object detection computation based on FLOP and propose several optimizations to enhance the efficiency of YOLO-based models. Firstly, we introduce an efficient backbone scaling inspired by inverted bottlenecks and theoretical insights from the Information Bottleneck principle. Secondly, we present the Fast Pyramidal Architecture Network (FPAN), designed to facilitate fast multiscale feature sharing while reducing computational resources. Lastly, we propose a Decoupled Network-in-Network (DNiN) detection head engineered to deliver rapid yet lightweight computations for classification and regression tasks. Building upon these optimizations and leveraging more efficient backbones, this paper contributes to a new scaling paradigm for object detection and YOLO-centric models called LeYOLO. Our contribution consistently outperforms existing models in various resource constraints, achieving unprecedented accuracy and flop ratio. Notably, LeYOLO-Small achieves a competitive mAP score of 38.2% on the COCOval with just 4.5 FLOP(G), representing a 42% reduction in computational load compared to the latest state-of-the-art YOLOv9-Tiny model while achieving similar accuracy. Our novel model family achieves a FLOP-to-accuracy ratio previously unattained, offering scalability that spans from ultra-low neural network configurations (<1 GFLOP) to efficient yet demanding object detection setups (>4 GFLOPs) with 25.2, 31.3, 35.2, 38.2, 39.3 and 41 mAP for 0.66, 1.47, 2.53, 4.51, 5.8 and 8.4 FLOP(G).

Problem

Research questions and friction points this paper is trying to address.

Improving parameter and FLOP efficiency in lightweight object detection models

Bridging performance gap between YOLO and SSDLite for low-resource devices

Achieving YOLO-like accuracy with MobileNet-level compactness for embedded systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

LeNeck framework improves accuracy and reduces parameters

LeYOLO model enhances YOLO computational efficiency

Optimized for mobile, embedded, and ultra-low-power devices

🔎 Similar Papers

No similar papers found.