Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach

📅 2024-04-01
🏛️ Symposium on Field Programmable Gate Arrays
📈 Citations: 2
Influential: 0
📄 PDF

career value

273K/year
🤖 AI Summary
Manual pragma configuration in high-level synthesis (HLS) suffers from low efficiency and an exponentially large search space. Method: This paper proposes the first nonlinear programming (NLP)-based automated pragma insertion framework, jointly optimizing loop-level pragmas—including pipelining, function unit replication, and data caching. It innovatively models discrete pragma configurations as continuous, differentiable variables and constructs analytical performance/resource models with theoretical lower-bound guarantees, solved globally via NLP. Integrated with pragma semantic analysis and the Merlin compiler, and augmented by design-space pruning, the framework explores billion-scale configurations within seconds to minutes. Contribution/Results: Experimental evaluation shows kernel performance approaching hand-tuned implementations, resource estimation error <8%, and latency lower-bound error ≤12%.

Technology Category

Application Category

📝 Abstract
High-Level Synthesis enables the rapid prototyping of hardware accelerators, by combining a high-level description of the functional behavior of a kernel with a set of micro-architecture optimizations as inputs. Such pragmas may describe the pipelining and replication of units, or even higher-level transformations for HLS such as automatic data caching using the AMD/Xilinx Merlin compiler. Selecting the best combination of pragmas, even within a restricted set, remains particularly challenging and the typical state-of-practice uses design-space exploration to navigate this space. But due to the highly irregular performance distribution of pragma configurations, typical DSE approaches are either extremely time consuming, or operating on a severely restricted search space. In this work we propose a framework to automatically insert HLS pragmas in regular loop-based programs, supporting pipelining, unit replication (coarse- and fine-grain), and data caching. We develop a simple analytical performance and resource model as a function of the input program properties and pragmas inserted. We prove this model provides a lower bound on the actual performance for any possible configuration. We then encode this model as a Non-Linear Program, by making the pragma configuration as unknowns of the system, which is computed optimally by solving this NLP. This approach can also be used during DSE, to quickly prune points with a (possibly partial) pragma configuration, employing this latency lower bound property. We extensively evaluate our end-to-end fully implemented system, showing it can effectively manipulate spaces of billions of designs in seconds to minutes for the kernels evaluated.
Problem

Research questions and friction points this paper is trying to address.

Automatic insertion of HLS pragmas
Optimizing pipelining and data caching
Non-linear programming for hardware synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-Linear Programming for pragma insertion
Analytical performance and resource modeling
Automatic pragma optimization in HLS