CVA6S+: A Superscalar RISC-V Core with High-Throughput Memory Architecture

📅 2025-04-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low IPC, limited memory bandwidth, and excessive area overhead in high-end embedded applications—particularly automotive electronics—this work proposes CVA6S+, a high-performance open-source superscalar RISC-V core. Methodologically, it integrates an enhanced TAGE branch predictor, full register renaming, and multi-level operand forwarding, while tightly co-optimizing the cache subsystem with the OpenHW Core-V HPDCache for improved energy efficiency. Evaluation shows that CVA6S+ achieves a 43.5% IPC improvement over the scalar CVA6 baseline and a further 10.9% gain over the prior superscalar CVA6S; L1 DCache read/write bandwidth increases by 74.1%, with only a 9.30% area overhead. The design thus delivers a balanced trade-off across performance, power, and area (PPA), establishing a reusable, production-ready open-source foundation for automotive-grade RISC-V processor development.

Technology Category

Application Category

📝 Abstract
Open-source RISC-V cores are increasingly adopted in high-end embedded domains such as automotive, where maximizing instructions per cycle (IPC) is becoming critical. Building on the industry-supported open-source CVA6 core and its superscalar variant, CVA6S, we introduce CVA6S+, an enhanced version incorporating improved branch prediction, register renaming and enhanced operand forwarding. These optimizations enable CVA6S+ to achieve a 43.5% performance improvement over the scalar configuration and 10.9% over CVA6S, with an area overhead of just 9.30% over the scalar core (CVA6). Furthermore, we integrate CVA6S+ with the OpenHW Core-V High-Performance L1 Dcache (HPDCache) and report a 74.1% bandwidth improvement over the legacy CVA6 cache subsystem.
Problem

Research questions and friction points this paper is trying to address.

Enhancing RISC-V core performance for high-end embedded applications
Improving branch prediction and operand forwarding in CVA6S+
Boosting memory bandwidth with High-Performance L1 Dcache integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced branch prediction for higher IPC
Improved register renaming for efficiency
Integrated high-performance L1 Dcache for bandwidth
🔎 Similar Papers
No similar papers found.
R
Riccardo Tedeschi
DEI, University of Bologna, Bologna, Italy
G
G. Ottavi
DEI, University of Bologna, Bologna, Italy
C
Côme Allart
Thales DIS, Meyreuil, France
Nils Wistoff
Nils Wistoff
PhD Student, ETH Zurich
processor designsecure computer architecture
Z
Zexin Fu
IIS, ETH Zurich, Zurich, Switzerland
F
Filippo Grillotti
STMicroelectronics, Agrate Brianza, Italy
F
F. D. Ambroggi
STMicroelectronics, Agrate Brianza, Italy
E
E. Guidetti
STMicroelectronics, Agrate Brianza, Italy
J
J. Rigaud
Mines Saint-Etienne, CEA, Leti, Centre CMP, F-13541 Gardanne, France
O
O. Potin
Mines Saint-Etienne, CEA, Leti, Centre CMP, F-13541 Gardanne, France
J
J. Coulon
Thales DIS, Meyreuil, France
C
C'esar Fuguet
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, TIMA, 38000 Grenoble, France
Luca Benini
Luca Benini
ETH Zürich, Università di Bologna
Integrated CircuitsComputer ArchitectureEmbedded SystemsVLSIMachine Learning
Davide Rossi
Davide Rossi
Associate Professor, University Of Bologna
VLSI systemsUltra-low-power circuitsmulti core architecturereconfigurable computing