Understanding Accelerator Compilers via Performance Profiling

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accelerator Design Language (ADL) compilers suffer from unpredictable hardware performance due to the semantic gap between high-level abstractions and low-level implementations, compounded by reliance on heuristic optimizations. This work introduces Petal, the first tool enabling cycle-accurate, interpretable performance analysis for Calyx-based accelerators. Petal bridges the abstraction gap via three key techniques: source-code instrumentation, RTL-simulation trace collection, and a novel reverse-mapping algorithm that associates low-level timing events in synthesized hardware with high-level control-flow constructs. Crucially, it abandons the “compiler-perfection” assumption, instead exposing how concrete compilation decisions impact end-to-end latency. Evaluated on multiple real-world accelerator designs, Petal identifies subtle, manually elusive bottlenecks—enabling targeted manual optimizations that reduce total execution cycles by up to 46.9% for one application.

Technology Category

Application Category

📝 Abstract
Accelerator design languages (ADLs), high-level languages that compile to hardware units, help domain experts quickly design efficient application-specific hardware. ADL compilers optimize datapaths and convert software-like control flow constructs into control paths. Such compilers are necessarily complex and often unpredictable: they must bridge the wide semantic gap between high-level semantics and cycle-level schedules, and they typically rely on advanced heuristics to optimize circuits. The resulting performance can be difficult to control, requiring guesswork to find and resolve performance problems in the generated hardware. We conjecture that ADL compilers will never be perfect: some performance unpredictability is endemic to the problem they solve. In lieu of compiler perfection, we argue for compiler understanding tools that give ADL programmers insight into how the compiler's decisions affect performance. We introduce Petal, a cycle-level Petal for the Calyx intermediate language (IL). Petal instruments the Calyx code with probes and then analyzes the trace from a register-transfer-level simulation. It maps the events in the trace back to high-level control constructs in the Calyx code to track the clock cycles when each construct was active. Using case studies, we demonstrate that Petal's cycle-level profiles can identify performance problems in existing accelerator designs. We show that these insights can also guide developers toward optimizations that the compiler was unable to perform automatically, including a reduction by 46.9% of total cycles for one application.
Problem

Research questions and friction points this paper is trying to address.

ADL compilers exhibit unpredictable performance due to complex optimizations
Performance problems in generated hardware require guesswork to resolve
Developers lack tools to understand compiler decisions affecting performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Petal instruments Calyx code with probes
Analyzes trace from register-transfer-level simulation
Maps trace events to high-level control constructs
🔎 Similar Papers
No similar papers found.
A
Ayaka Yorihiro
Cornell University
G
Griffin Berlstein
Cornell University
P
Pedro Pontes García
Cornell University
K
Kevin Laeufer
Cornell University
Adrian Sampson
Adrian Sampson
Computer Science, Cornell University
approximate computingcomputer architectureprogramming languagesenergy efficiencyhardware–software co-design