A New Benchmark for the Appropriate Evaluation of RTL Code Optimization

πŸ“… 2026-01-05
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes RTL-OPT, a new benchmark addressing the limitations of existing evaluations that primarily focus on syntactic correctness of RTL code while inadequately assessing power, performance, and area (PPA) optimization quality. RTL-OPT comprises 36 handcrafted digital circuit tasks, each paired with unoptimized and expert-optimized RTL implementations. The benchmark introduces an end-to-end automated evaluation pipeline that integrates formal functional equivalence checking with quantitative PPA analysis. It enables the first systematic assessment of large language models’ capabilities in RTL optimization, incorporating optimization patterns reflective of industrial practice and capturing dimensions often overlooked by conventional synthesis tools. RTL-OPT thus provides a standardized, quantifiable platform for evaluating LLM-driven hardware design optimization.

Technology Category

Application Category

πŸ“ Abstract
The rapid progress of artificial intelligence increasingly relies on efficient integrated circuit (IC) design. Recent studies have explored the use of large language models (LLMs) for generating Register Transfer Level (RTL) code, but existing benchmarks mainly evaluate syntactic correctness rather than optimization quality in terms of power, performance, and area (PPA). This work introduces RTL-OPT, a benchmark for assessing the capability of LLMs in RTL optimization. RTL-OPT contains 36 handcrafted digital designs that cover diverse implementation categories including combinational logic, pipelined datapaths, finite state machines, and memory interfaces. Each task provides a pair of RTL codes, a suboptimal version and a human-optimized reference that reflects industry-proven optimization patterns not captured by conventional synthesis tools. Furthermore, RTL-OPT integrates an automated evaluation framework to verify functional correctness and quantify PPA improvements, enabling standardized and meaningful assessment of generative models for hardware design optimization.
Problem

Research questions and friction points this paper is trying to address.

RTL optimization
large language models
PPA evaluation
hardware design
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

RTL optimization
large language models
hardware design benchmark
PPA evaluation
automated verification
πŸ”Ž Similar Papers
No similar papers found.