🤖 AI Summary
Existing performance analysis tools for HLS-generated IP on FPGAs are constrained by dynamic architectures, requiring dedicated ports, exclusive BRAM usage, or vendor-specific primitives—hindering system-level dynamic observability. This paper proposes a streaming co-analysis architecture tailored for dynamically reconfigurable HLS-based neural networks. It introduces the first profiling mechanism synchronized with dataflow splitting and merging, enabling monitoring metadata to share the same data path as computational data—eliminating hardware modifications and vendor primitive dependencies. Built upon Vivado HLS, Zynq PS/PL heterogeneous integration, and a custom streaming monitoring protocol, the approach leverages RINN modeling and RTL-level co-simulation for validation. Evaluated on randomly interconnected neural networks, it achieves end-to-end profiling using FIFO occupancy as the key metric, reducing monitoring overhead by 67% and improving timing closure by 40%.
📝 Abstract
Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution. Existing profiling tools for High-Level Synthesis (HLS) IPs running on FPGAs are far less mature compared with those developed for fixed CPU and GPU architectures and they still lag behind mainly due to their dynamic architecture. This limitation is reflected in the typical approach of extracting monitoring signals off of an FPGA device individually from dedicated ports, using one BRAM per signal for temporary information storage, or embedding vendor specific primitives to manually analyze the waveform. In this paper, we propose a systematic profiling method tailored to the dynamic nature of FPGA systems, particularly suitable for streaming accelerators. Instead of relying on signal extraction, the proposed profiling stream flows alongside the actual data, dynamically splitting and merging in synchrony with the data stream, and is ultimately directed to the processing system (PS) side. We conducted a preliminary evaluation of this method on randomly interconnected neural networks (RINNs) using the FIFO fullness metric, with co-simulation results for validation.