🤖 AI Summary
Existing cycle-accurate simulators (e.g., GEM5) suffer from prohibitively low speed for multicore CPU performance evaluation, while prior machine learning approaches only support basic-block-level prediction and cannot enable end-to-end simulation of full benchmark programs. To address this, we propose the first deep learning framework enabling full-program-level, high-fidelity, and ultra-fast performance prediction. Our key innovations are: (1) a context-enhanced instruction trace sampling strategy that preserves critical out-of-order execution semantics; and (2) an attention-based neural network architecture capturing long-range dependencies and microarchitectural interactions. Evaluated on an Intel Xeon platform, our method achieves 2.2–8.3× speedup over GEM5’s O3 CPU model while maintaining sub-5% relative error in millisecond-scale cycle prediction. This work marks the first demonstration of end-to-end, high-fidelity, real-time performance simulation for complete benchmark suites.
📝 Abstract
CPU simulators are vital for computer architecture research, primarily for estimating performance under different programs. This poses challenges for fast and accurate simulation of modern CPUs, especially in multi-core systems. Modern CPU peformance simulators such as GEM5 adopt the cycle-accurate and event-driven approach, which is timeconsuming to simulate the extensive microarchitectural behavior of a real benchmark running on out-of-order CPUs. Recently, machine leaning based approach has been proposed to improve simulation speed, but they are currently limited to estimating the cycles of basic blocks rather than the complete benchmark program. This paper introduces a novel ML-based CPU simulator named CAPSim, which uses an attention-based neural network performance predictor and instruction trace sampling method annotated with context. The attention mechanism effectively captures long-range influence within the instruction trace, emphasizing critical context information. This allows the model to improve performance prediction accuracy by focusing on important code instruction. CAPSim can predict the execution time of unseen benchmarks at a significantly fast speed compared with an accurate O3 simulator built with gem5. Our evaluation on a commercial Intel Xeon CPU demonstrates that CAPSim achieves a 2.2 - 8.3x speedup compared to using gem5 built simulator, which is superior to the cutting-edge deep learning approach