π€ AI Summary
This work addresses critical bottlenecks in LLM-driven RTL design flowsβnamely, stochastic outputs, high computational cost, poor reproducibility, and insufficient coverage in automated testbench generation. To overcome these challenges, the authors propose the Structured Testbench Generation (STG) framework, which introduces a structured approach that leverages the inherent architecture of hardware designs to produce deterministic testbenches. STG integrates CPU-efficient data curation and inference-time pruning mechanisms during testing. The proposed method substantially enhances verification efficiency and reliability: it accelerates testbench generation by 720Γ, achieves higher compilation success rates and coverage, and reduces misjudgment rates. Moreover, data curation is sped up by 11Γ with a 127Γ reduction in energy consumption. Evaluated across multiple benchmarks, the model attains state-of-the-art performance.
π Abstract
Automated testbench generation has become a critical bottleneck in large language model (LLM)-driven Register Transfer Level (RTL) workflows, where large numbers of candidate designs must be verified rapidly and reliably. Existing prompt-based approaches treat testbench generation as unconstrained code synthesis, yielding stochastic outputs with high token cost, low reproducibility, and insufficient coverage. To address this gap, we present STG, a Structured Testbench Generation framework that exploits the inherent structure of hardware designs to generate deterministic testbenches. As a direct verification tool, STG runs 720x faster than an iterative LLM-based testbench generation flow and higher rate of successful compilation, achieves higher coverage, and reduces false-pass verdicts on incorrect DUTs. STG also helps identify errors in RTL generation benchmarks by exposing faulty benchmark testbenches. As a data curation engine, it is 11x faster than LLM-based filtering on a single CPU core with 127x less energy, and the resulting distilled models provide state-of-the-art performance in our multi-benchmark evaluation. As a test-time scaling oracle, it reduces node count by 14-47\%. Our models are available at https://huggingface.co/collections/AS-SiliconMind/siliconmind-v12.