QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automated superscalar processor design by proposing the first end-to-end fully automatic CPU generation methodology—overcoming the limitation of prior automated approaches, which support only single-cycle processors. The core innovation is the State-aware Binary Speculation Diagram (State-BSD), a unified framework jointly modeling state selection and state speculation: state selection employs simulated annealing to optimize a lightweight selector, while state speculation extends the Binary Speculation Diagram (BSD) paradigm to train a high-accuracy speculator that leverages on-chip state for real-time data-dependence learning. The method ensures functional correctness and prediction accuracy under resource constraints. The generated QiMeng-CPU-v2 achieves a 380× performance improvement over the previous best automated design and approaches the performance level of manually designed superscalar processors such as the ARM Cortex-A53.

Technology Category

Application Category

📝 Abstract
Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on superscalar processor design because they cannot address inter-instruction data dependencies, leading to inefficient sequential instruction execution. This paper proposes a novel approach to automatically designing superscalar processors using a hardware-friendly model called the Stateful Binary Speculation Diagram (State-BSD). We observe that processor parallelism can be enhanced through on-the-fly inter-instruction dependent data predictors, reusing the processor's internal states to learn the data dependency. To meet the challenge of both hardware-resource limitation and design functional correctness, State-BSD consists of two components: 1) a lightweight state-selector trained by the simulated annealing method to detect the most reusable processor states and store them in a small buffer; and 2) a highly precise state-speculator trained by the BSD expansion method to predict the inter-instruction dependent data using the selected states. It is the first work to achieve the automated superscalar processor design, i.e. QiMeng-CPU-v2, which improves the performance by about $380 imes$ than the state-of-the-art automated design and is comparable to human-designed superscalar processors such as ARM Cortex A53.
Problem

Research questions and friction points this paper is trying to address.

Automating superscalar processor design to enhance performance
Addressing inter-instruction data dependencies for parallel execution
Balancing hardware-resource limits with functional correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

State-BSD model for superscalar processor design
On-the-fly inter-instruction data dependency prediction
Simulated annealing and BSD expansion for training
🔎 Similar Papers
No similar papers found.
S
Shuyao Cheng
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
R
Rui Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
Wenkai He
Wenkai He
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies
P
Pengwei Jin
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies
Chongxiao Li
Chongxiao Li
ICT, CAS
Computer Architecture
Z
Zidong Du
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies
X
Xingui Hu
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies
Y
Yifan Hao
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
Guanglin Xu
Guanglin Xu
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
Yuanbo Wen
Yuanbo Wen
Institute of Computing Technology, Chinese Academy of Sciences
Machine Learning System
L
Ling Li
Institute of Software, Chinese Academy of Sciences
Q
Qi Guo
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
Yunji Chen
Yunji Chen
Institute of Computing Technology, Chinese Academy of Sciences
processor architecturemicroarchitecturemachine learning