🤖 AI Summary
To address the high complexity and poor pedagogical suitability of conventional five-stage RISC pipelines in computer architecture education, this paper designs and evaluates a lightweight, three-stage RISC-V pipelined microprocessor tailored for teaching and embedded applications. Implemented in Verilog, it undergoes systematic comparative evaluation—across resource utilization, critical path delay, and timing performance—on both Xilinx FPGAs and the SkyWater 130 nm open-source ASIC tapeout flow, contrasting three-, four-, and five-stage variants. Results demonstrate that the three-stage pipeline achieves the highest operating frequency on both platforms, challenging the common assumption that longer pipelines yield higher performance. It reduces logic resource usage by ~30%, features a significantly simplified datapath, and markedly improves pedagogical clarity and implementation feasibility. This work presents the first RISC-V pipeline architecture explicitly optimized for instruction, validated across FPGA and ASIC platforms.
📝 Abstract
In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be optimal today, and it is certainly not the easiest pipeline for education. This paper examines more straightforward pipeline organizations for RISC processors that are suitable for educational purposes and for implementing embedded processors in FPGAs and ASICs. We analyze resource costs and maximum clock frequency of various designs implemented in an FPGA, using clock frequency as a performance proxy. Additionally, we validate these results with ASIC designs synthesized using the open-source SkyWater130 process. Contradictory to common wisdom, a longer pipeline (up to 5 stages) does not necessarily always increase the maximum clock frequency. In two FPGA and one ASIC implementation, we discovered that a four- or five-stage pipeline leads to a slower clock frequency than a three-stage implementation. The reason is that the width of the forwarding multiplexer in the execution stage increases with longer pipelines, which is on the critical path. We also argue that a 3-stage pipeline organization is more adequate for teaching a pipeline organization of a microprocessor.