🤖 AI Summary
To address three key bottlenecks in hardware fuzzing—weak semantic awareness, low testing efficiency, and high simulation overhead—this paper proposes a two-stage generative hardware fuzzing framework. Methodologically, it introduces the first GRM (Golden Reference Model)-guided decoupled fuzzing paradigm, replacing slow-cycle-accurate device emulation with an ISA-compatible digital twin; further, it designs an instruction-block-level generation strategy coupled with a dual-tier coverage feedback mechanism to jointly optimize test quality and deep state-space exploration. Evaluated on RISC-V cores—including RocketChip, BOOM, CVA6, and the commercial BA51-H—the framework reduces test length by 42% and cuts computational overhead by 5.8×. It discovers five previously unknown vulnerabilities (four with CVSS ≥ 7.0) and, for the first time, reveals two undocumented defects in the BA51-H core.
📝 Abstract
Modern hardware systems, driven by demands for high performance and application-specific functionality, have grown increasingly complex, introducing large surfaces for bugs and security-critical vulnerabilities. Fuzzing has emerged as a scalable solution for discovering such flaws. Yet, existing hardware fuzzers suffer from limited semantic awareness, inefficient test refinement, and high computational overhead due to reliance on slow device simulation.
In this paper, we present GoldenFuzz, a novel two-stage hardware fuzzing framework that partially decouples test case refinement from coverage and vulnerability exploration. GoldenFuzz leverages a fast, ISA-compliant Golden Reference Model (GRM) as a ``digital twin'' of the Device Under Test (DUT). It fuzzes the GRM first, enabling rapid, low-cost test case refinement, accelerating deep architectural exploration and vulnerability discovery on DUT. During the fuzzing pipeline, GoldenFuzz iteratively constructs test cases by concatenating carefully chosen instruction blocks that balance the subtle inter- and intra-instructions quality. A feedback-driven mechanism leveraging insights from both high- and low-coverage samples further enhances GoldenFuzz's capability in hardware state exploration. Our evaluation of three RISC-V processors, RocketChip, BOOM, and CVA6, demonstrates that GoldenFuzz significantly outperforms existing fuzzers in achieving the highest coverage with minimal test case length and computational overhead. GoldenFuzz uncovers all known vulnerabilities and discovers five new ones, four of which are classified as highly severe with CVSS v3 severity scores exceeding seven out of ten. It also identifies two previously unknown vulnerabilities in the commercial BA51-H core extension.