🤖 AI Summary
To address the low efficiency and long verification cycles in designing cache-coherent memory subsystems for multicore SoCs, this paper proposes a unified framework integrating RTL-level and system-level simulation. The framework supports configurable RTL code generation and tightly couples gem5 full-system simulation with Verilator cycle-accurate simulation, enabling co-verification of cache coherence protocols (e.g., MSI) and multi-level private-cache hierarchies under real operating systems and workloads. Its key contribution is a hybrid simulation methodology scalable to 16 cores, which preserves functional correctness while incurring only 1.6–2.7× performance overhead relative to the gem5 Ruby model—demonstrating diminishing overhead growth with increasing core count. This design significantly improves verification throughput and scalability for complex cache-coherent architectures.
📝 Abstract
Designing and validating efficient cache-coherent memory subsystems is a critical yet complex task in the development of modern multi-core system-on-chip architectures. Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems. On the design side, Rhea generates synthesizable, highly configurable RTL supporting various architectural parameters. On the validation side, Rhea integrates Verilator's cycle-accurate RTL simulation with gem5's full-system simulation, allowing realistic workloads and operating systems to run alongside the actual RTL under test. We apply Rhea to design MSI-based RTL memory subsystems with one and two levels of private caches and scaling up to sixteen cores. Their evaluation with 22 applications from state-of-the-art benchmark suites shows intermediate performance relative to gem5 Ruby's MI and MOESI models. The hybrid gem5-Verilator co-simulation flow incurs a moderate simulation overhead, up to 2.7 times compared to gem5 MI, but achieves higher fidelity by simulating real RTL hardware. This overhead decreases with scale, down to 1.6 times in sixteen-core scenarios. These results demonstrate Rhea's effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.