Structured Testbench Generation for LLM-Driven HDL Design and Verification-Oriented Data Curation

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses critical bottlenecks in LLM-driven RTL design flows—namely, stochastic outputs, high computational cost, poor reproducibility, and insufficient coverage in automated testbench generation. To overcome these challenges, the authors propose the Structured Testbench Generation (STG) framework, which introduces a structured approach that leverages the inherent architecture of hardware designs to produce deterministic testbenches. STG integrates CPU-efficient data curation and inference-time pruning mechanisms during testing. The proposed method substantially enhances verification efficiency and reliability: it accelerates testbench generation by 720×, achieves higher compilation success rates and coverage, and reduces misjudgment rates. Moreover, data curation is sped up by 11× with a 127× reduction in energy consumption. Evaluated across multiple benchmarks, the model attains state-of-the-art performance.

📝 Abstract

Automated testbench generation has become a critical bottleneck in large language model (LLM)-driven Register Transfer Level (RTL) workflows, where large numbers of candidate designs must be verified rapidly and reliably. Existing prompt-based approaches treat testbench generation as unconstrained code synthesis, yielding stochastic outputs with high token cost, low reproducibility, and insufficient coverage. To address this gap, we present STG, a Structured Testbench Generation framework that exploits the inherent structure of hardware designs to generate deterministic testbenches. As a direct verification tool, STG runs 720x faster than an iterative LLM-based testbench generation flow and higher rate of successful compilation, achieves higher coverage, and reduces false-pass verdicts on incorrect DUTs. STG also helps identify errors in RTL generation benchmarks by exposing faulty benchmark testbenches. As a data curation engine, it is 11x faster than LLM-based filtering on a single CPU core with 127x less energy, and the resulting distilled models provide state-of-the-art performance in our multi-benchmark evaluation. As a test-time scaling oracle, it reduces node count by 14-47\%. Our models are available at https://huggingface.co/collections/AS-SiliconMind/siliconmind-v12.

Problem

Research questions and friction points this paper is trying to address.

testbench generation

LLM-driven HDL design

verification

structured generation

data curation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured Testbench Generation

LLM-driven HDL Verification

Deterministic Testbench Synthesis