🤖 AI Summary
This work proposes a serial 8-bit hardware accelerator based on a shared multi-precision datapath to address the low-power and low-latency requirements of real-time acoustic drone detection and temporal tracking on edge devices. The design employs a layer-wise, 1D feature-driven CNN architecture that integrates mixed-precision quantization (FP32/BF16/INT8/FXP8), structured channel pruning, and serialized dense layer processing to significantly reduce computational complexity while preserving high accuracy. Implemented on a Pynq-Z2 FPGA, the system achieves 89.91% detection accuracy in FP32 mode, with less than 2.5% accuracy degradation in 8-bit mode, consuming only 0.94 W and exhibiting an end-to-end latency of 116 ms. It also reduces logic resource usage by 5–9% compared to parallel counterparts. ASIC synthesis results show a core area of 3.29 mm², a maximum operating frequency of 1.56 GHz, and a total power consumption of 1.65 W.
📝 Abstract
Real-time unmanned aerial vehicle (UAV) acoustic detection at the edge demands low-latency inference under strict power and hardware limits. This paper presents SHIELD8-UAV, a sequential 8-bit hardware implementation of a precision-aware 1D feature-driven CNN (1D-F-CNN) accelerator for continuous acoustic monitoring. The design performs layer-wise execution on a shared multi-precision datapath, eliminating the need for replicated processing elements. A layer-sensitivity quantisation framework supports FP32, BF16, INT8, and FXP8 modes, while structured channel pruning reduces the flattened feature dimension from 35,072 to 8,704 (75%), thereby lowering serialised dense-layer cycles. The model achieves 89.91% detection accuracy in FP32 with less than 2.5% degradation in 8-bit modes. The accelerator uses 2,268 LUTs and 0.94 W power with 116 ms end-to-end latency, achieving 37.8% and 49.6% latency reduction compared with QuantMAC and LPRE, respectively, on a Pynq-Z2 FPGA, and 5-9% lower logic usage than parallel designs. ASIC synthesis in UMC 40 nm technology shows a maximum operating frequency of 1.56 GHz, 3.29 mm2 core area, and 1.65 W total power. These results demonstrate that sequential execution combined with precision-aware quantisation and serialisation-aware pruning enables practical low-energy edge inference without relying on massive parallelism.