๐ค AI Summary
To address the challenges of resource constraints, real-time performance, and energy efficiency on edge FPGA platforms, this paper proposes a lightweight YOLOv5 hardware acceleration architecture tailored for the Xilinx Kria KV260. The method integrates model pruning, low-bit quantization (e.g., INT4/INT8), and a customized CNN pipeline with optimized dataflow and memory hierarchy. This design significantly reduces computational and memory overhead while preserving detection accuracy. The system is end-to-end trained and deployed on COCO and GTSRB datasets. Experimental results demonstrate a power consumption of only 3.5 W, an inference throughput of 9 FPS, and a mean Average Precision at IoU=0.5 (mAP@0.5) of 99%. Compared to the original YOLOv5, the implementation achieves substantial reductions in LUT and BRAM utilizationโby approximately 62% and 58%, respectively. The proposed solution delivers high-accuracy, low-latency, and energy-efficient real-time object detection and classification, making it particularly suitable for safety-critical edge applications such as Advanced Driver Assistance Systems (ADAS).
๐ Abstract
Object detection and classification are crucial tasks across various application domains, particularly in the development of safe and reliable Advanced Driver Assistance Systems (ADAS). Existing deep learning-based methods such as Convolutional Neural Networks (CNNs), Single Shot Detectors (SSDs), and You Only Look Once (YOLO) have demonstrated high performance in terms of accuracy and computational speed when deployed on Field-Programmable Gate Arrays (FPGAs). However, despite these advances, state-of-the-art YOLO-based object detection and classification systems continue to face challenges in achieving resource efficiency suitable for edge FPGA platforms. To address this limitation, this paper presents a resource-efficient real-time object detection and classification system based on YOLOv5 optimized for FPGA deployment. The proposed system is trained on the COCO and GTSRD datasets and implemented on the Xilinx Kria KV260 FPGA board. Experimental results demonstrate a classification accuracy of 99%, with a power consumption of 3.5W and a processing speed of 9 frames per second (FPS). These findings highlight the effectiveness of the proposed approach in enabling real-time, resource-efficient object detection and classification for edge computing applications.