🤖 AI Summary
Existing RTL generation frameworks for tensor acceleration in multimodal foundation models and generative AI exhibit a significant trade-off between flexibility and automation. Method: We propose an end-to-end spatial architecture auto-synthesis methodology that unifies computation, dataflow, and memory hierarchy modeling via affine transformations, integrates multi-strategy spatial dataflow design, abstracts hardware structure as a graph, and jointly optimizes pipeline register insertion and resource scheduling using data reuse analysis and linear programming—enabling fully automated memory system synthesis while suppressing logic overhead under synthesizability constraints. Contribution/Results: Experiments show our approach achieves 3.2× higher throughput and 2.4× better energy efficiency on average compared to Gemmini, and supports unified, efficient hardware mapping across diverse generative AI models.
📝 Abstract
Modern tensor applications, especially foundation models and generative AI applications require multiple input modalities (both vision and language), which increases the demand for flexible accelerator architecture. Existing frameworks suffer from the trade-off between design flexibility and productivity of RTL generation: either limited to very few hand-written templates or cannot automatically generate the RTL. To address this challenge, we propose the LEGO framework, which targets tensor applications and automatically generates spatial architecture design and outputs synthesizable RTL code without handwritten RTL design templates. Leveraging the affine-transformation-based architecture representation, LEGO front end finds interconnections between function units, synthesizes the memory system, and fuses different spatial dataflow designs based on data reuse analysis. LEGO back end then translates the hardware in a primitive-level graph to perform lower-level optimizations, and applies a set of linear-programming algorithms to optimally insert pipeline registers and reduce the overhead of unused logic when switching spatial dataflows. Our evaluation demonstrates that LEGO can achieve 3.2x speedup and 2.4x energy efficiency compared to previous work Gemmini, and can generate one architecture for diverse modern foundation models in generative AI applications.