🤖 AI Summary
This paper addresses the challenge of automatically mapping PyTorch models to synthesizable hardware. We propose an open-source, end-to-end compilation toolchain that innovatively integrates Allo (an accelerator design language), Calyx (a hardware intermediate representation), and CIRCT (an LLVM-based hardware compilation framework). A key methodological contribution is a memory-partitioning compilation pass tailored for memory-intensive machine learning workloads, which significantly enhances data parallelism and on-chip memory efficiency. Our core contributions are threefold: (1) the first fully automated, synthesizable translation from PyTorch frontend to SystemVerilog RTL; (2) a memory optimization strategy that preserves functional correctness while substantially reducing off-chip bandwidth pressure; and (3) experimental validation showing that the generated FPGA implementations achieve throughput and resource utilization comparable to industrial-grade, closed-source tools such as Vitis HLS.
📝 Abstract
We present an end-to-end open-source compiler toolchain that targets synthesizable SystemVerilog from ML models written in PyTorch. Our toolchain leverages the accelerator design language Allo, the hardware intermediate representation (IR) Calyx, and the CIRCT project under LLVM. We also implement a set of compiler passes for memory partitioning, enabling effective parallelism in memory-intensive ML workloads. Experimental results demonstrate that our compiler can effectively generate optimized FPGA-implementable hardware designs that perform reasonably well against closed-source industry-grade tools such as Vitis HLS.