BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hardware verification suffers from a scarcity of realistic fault data, and existing fault-injection techniques struggle to simultaneously achieve diversity, scalability, and functional fidelity. This paper introduces the first fully automated, multi-agent LLM pipeline tailored for RTL design, featuring a novel closed-loop self-correction architecture that integrates modular partitioning, goal-directed mutation selection, iterative refinement, and rollback-based validation—enabling scalable, autonomous synthesis of high-fidelity logical defects. Evaluated on five OpenTitan IP cores, the pipeline generated 500 unique, functionally accurate defects (94% accuracy) at a throughput of 17.7 defects/hour; uncovered 104 previously undetected historical bugs; and enabled ML-based fault attribution models to achieve classification accuracies of 88.1%–93.2%. To our knowledge, this is the first systematic solution to the RTL-level defect dataset construction challenge, establishing a high-quality, scalable data foundation for ML-augmented hardware debugging.

Technology Category

Application Category

📝 Abstract
Hardware complexity continues to strain verification resources, motivating the adoption of machine learning (ML) methods to improve debug efficiency. However, ML-assisted debugging critically depends on diverse and scalable bug datasets, which existing manual or automated bug insertion methods fail to reliably produce. We introduce BugGen, a first of its kind, fully autonomous, multi-agent pipeline leveraging Large Language Models (LLMs) to systematically generate, insert, and validate realistic functional bugs in RTL. BugGen partitions modules, selects mutation targets via a closed-loop agentic architecture, and employs iterative refinement and rollback mechanisms to ensure syntactic correctness and functional detectability. Evaluated across five OpenTitan IP blocks, BugGen produced 500 unique bugs with 94% functional accuracy and achieved a throughput of 17.7 validated bugs per hour-over five times faster than typical manual expert insertion. Additionally, BugGen identified 104 previously undetected bugs in OpenTitan regressions, highlighting its utility in exposing verification coverage gaps. Compared against Certitude, BugGen demonstrated over twice the syntactic accuracy, deeper exposure of testbench blind spots, and more functionally meaningful and complex bug scenarios. Furthermore, when these BugGen-generated datasets were employed to train ML-based failure triage models, we achieved high classification accuracy (88.1%-93.2%) across different IP blocks, confirming the practical utility and realism of generated bugs. BugGen thus provides a scalable solution for generating high-quality bug datasets, significantly enhancing verification efficiency and ML-assisted debugging.
Problem

Research questions and friction points this paper is trying to address.

Generates realistic RTL bugs using multi-agent LLMs
Improves verification efficiency with scalable bug datasets
Enhances ML-assisted debugging with high-quality bug data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM pipeline for RTL bug synthesis
Closed-loop agentic architecture for mutation targets
Iterative refinement ensures functional detectability
🔎 Similar Papers
No similar papers found.
S
Surya Jasper
Texas A&M University
M
Minh Luu
Infineon Technologies
E
Evan Pan
Texas A&M University
Aakash Tyagi
Aakash Tyagi
Texas A&M University
High Performance ComputingHardware VerificationMachine Learning
M
Michael Quinn
Texas A&M University
J
Jiang Hu
Texas A&M University
D
D. Houngninou
Texas A&M University