Dynamic Loop Fusion in High-Level Synthesis

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-level synthesis (HLS), multi-loop nests with uncertain memory dependencies are often conservatively serialized, severely limiting parallelism. Method: This paper proposes a dynamic loop fusion technique integrating compiler-hardware co-design, runtime address monotonicity verification, and polyhedral scheduling heuristics. By jointly performing program-order scheduling and address monotonicity analysis, it dynamically resolves unpredictable memory dependences—eliminating the need for static disambiguation via address history search or loop serialization constraints. Contribution/Results: Our approach enables, for the first time, safe parallel execution of sibling loops with complex memory dependencies in dynamic HLS. Experimental evaluation shows an average 14× speedup over conventional static HLS and a 4× improvement over state-of-the-art dynamic HLS, significantly enhancing memory optimization capability and throughput efficiency for irregular code.

Technology Category

Application Category

📝 Abstract
Dynamic High-Level Synthesis (HLS) uses additional hardware to perform memory disambiguation at runtime, increasing loop throughput in irregular codes compared to static HLS. However, most irregular codes consist of multiple sibling loops, which currently have to be executed sequentially by all HLS tools. Static HLS performs loop fusion only on regular codes, while dynamic HLS relies on loops with dependencies to run to completion before the next loop starts. We present dynamic loop fusion for HLS, a compiler/hardware co-design approach that enables multiple loops to run in parallel, even if they contain unpredictable memory dependencies. Our only requirement is that memory addresses are monotonically non-decreasing in inner loops. We present a novel program-order schedule for HLS, inspired by polyhedral compilers, that together with our address monotonicity analysis enables dynamic memory disambiguation that does not require searching of address histories and sequential loop execution. Our evaluation shows an average speedup of 14$ imes$ over static and 4$ imes$ over dynamic HLS.
Problem

Research questions and friction points this paper is trying to address.

High-Level Synthesis
Loop Fusion
Memory Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Loop Fusion
Compiler-Hardware Co-design
Irregular Code Parallelization
🔎 Similar Papers
No similar papers found.