OpenLLM-RTL: Open Dataset and Benchmark for LLM-Aided Design RTL Generation

📅 2025-03-19

📈 Citations: 3

✨ Influential: 0

career value

198K/year

🤖 AI Summary

The hardware design domain suffers from a scarcity of high-quality instruction-code pairs, reproducible benchmarks, and robust functional correctness verification mechanisms for LLM-assisted RTL design. Method: This paper introduces the first open-source dataset and benchmarking framework tailored for LLM-powered RTL design. It proposes RTLLM 2.0 (for RTL code generation) and AssertEval (for assertion generation) as dual-task benchmarks, and develops a novel RTL-simulation-based data quality filtering method to curate a 7K-sample dataset of high-confidence, functionally verified examples. Contribution/Results: Through systematic instruction-code pair construction, rigorous data cleaning, targeted model fine-tuning, and comprehensive evaluation, the framework significantly improves functional correctness in LLM-generated RTL. Experiments demonstrate that synergistic optimization of data scale, quality, and training strategy systematically enhances model performance—establishing a reproducible, verifiable infrastructure to advance LLMs in hardware design.

Technology Category

Application Category

📝 Abstract

The automated generation of design RTL based on large language model (LLM) and natural language instructions has demonstrated great potential in agile circuit design. However, the lack of datasets and benchmarks in the public domain prevents the development and fair evaluation of LLM solutions. This paper highlights our latest advances in open datasets and benchmarks from three perspectives: (1) RTLLM 2.0, an updated benchmark assessing LLM's capability in design RTL generation. The benchmark is augmented to 50 hand-crafted designs. Each design provides the design description, test cases, and a correct RTL code. (2) AssertEval, an open-source benchmark assessing the LLM's assertion generation capabilities for RTL verification. The benchmark includes 18 designs, each providing specification, signal definition, and correct RTL code. (3) RTLCoder-Data, an extended open-source dataset with 80K instruction-code data samples. Moreover, we propose a new verification-based method to verify the functionality correctness of training data samples. Based on this technique, we further release a dataset with 7K verified high-quality samples. These three studies are integrated into one framework, providing off-the-shelf support for the development and evaluation of LLMs for RTL code generation and verification. Finally, extensive experiments indicate that LLM performance can be boosted by enlarging the training dataset, improving data quality, and improving the training scheme.

Problem

Research questions and friction points this paper is trying to address.

Lack of public datasets for LLM-aided RTL generation

Need for benchmarks to evaluate LLM RTL generation capabilities

Requirement for verified high-quality training data samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

RTLLM 2.0: Enhanced benchmark for RTL generation.

AssertEval: Open-source benchmark for assertion generation.

RTLCoder-Data: Extended dataset with 80K instruction-code samples.

🔎 Similar Papers

RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique

2023-12-14IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsCitations: 57