🤖 AI Summary
This work addresses the challenge of automated superscalar processor design by proposing the first end-to-end fully automatic CPU generation methodology—overcoming the limitation of prior automated approaches, which support only single-cycle processors. The core innovation is the State-aware Binary Speculation Diagram (State-BSD), a unified framework jointly modeling state selection and state speculation: state selection employs simulated annealing to optimize a lightweight selector, while state speculation extends the Binary Speculation Diagram (BSD) paradigm to train a high-accuracy speculator that leverages on-chip state for real-time data-dependence learning. The method ensures functional correctness and prediction accuracy under resource constraints. The generated QiMeng-CPU-v2 achieves a 380× performance improvement over the previous best automated design and approaches the performance level of manually designed superscalar processors such as the ARM Cortex-A53.
📝 Abstract
Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on superscalar processor design because they cannot address inter-instruction data dependencies, leading to inefficient sequential instruction execution. This paper proposes a novel approach to automatically designing superscalar processors using a hardware-friendly model called the Stateful Binary Speculation Diagram (State-BSD). We observe that processor parallelism can be enhanced through on-the-fly inter-instruction dependent data predictors, reusing the processor's internal states to learn the data dependency. To meet the challenge of both hardware-resource limitation and design functional correctness, State-BSD consists of two components: 1) a lightweight state-selector trained by the simulated annealing method to detect the most reusable processor states and store them in a small buffer; and 2) a highly precise state-speculator trained by the BSD expansion method to predict the inter-instruction dependent data using the selected states. It is the first work to achieve the automated superscalar processor design, i.e. QiMeng-CPU-v2, which improves the performance by about $380 imes$ than the state-of-the-art automated design and is comparable to human-designed superscalar processors such as ARM Cortex A53.