ENFOR-SA: End-to-end Cross-layer Transient Fault Injector for Efficient and Accurate DNN Reliability Assessment on Systolic Arrays

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of balancing accuracy and efficiency in evaluating the reliability of systolic-array-based deep neural networks (DNNs) under transient faults. The authors propose an end-to-end cross-layer transient fault injection framework that employs an RTL-level systolic array model exclusively during the fault injection phase, while offloading all other computations to the software layer. This approach demonstrates for the first time that RTL-level accuracy can be preserved without requiring full-cycle, high-overhead RTL simulation. By leveraging a two-stage cross-layer simulation methodology, the framework achieves substantial performance gains: experiments show an average speedup of 569× over full SoC RTL simulation and 2.03× over state-of-the-art cross-layer tools, with only a 6% slowdown compared to pure software-based injection—all while maintaining hardware-level fidelity in fault impact assessment.

Technology Category

Application Category

📝 Abstract

Recent advances in deep learning have produced highly accurate but increasingly large and complex DNNs, making traditional fault-injection techniques impractical. Accurate fault analysis requires RTL-accurate hardware models. However, this significantly slows evaluation compared with software-only approaches, particularly when combined with expensive HDL instrumentation. In this work, we show that such high-overhead methods are unnecessary for systolic array (SA) architectures and propose ENFOR-SA, an end-to-end framework for DNN transient fault analysis on SAs. Our two-step approach employs cross-layer simulation and uses RTL SA components only during fault injection, with the rest executed at the software level. Experiments on CNNs and Vision Transformers demonstrate that ENFOR-SA achieves RTL-accurate fault injection with only 6% average slowdown compared to software-based injection, while delivering at least two orders of magnitude speedup (average $569\times$) over full-SoC RTL simulation and a $2.03\times$ improvement over a state-of-the-art cross-layer RTL injection tool. ENFOR-SA code is publicly available at https://github.com/rafaabt/ENFOR-SA.

Problem

Research questions and friction points this paper is trying to address.

transient fault injection

DNN reliability

systolic array

RTL simulation

fault analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

systolic array

transient fault injection

cross-layer simulation

DNN reliability

RTL-accurate

🔎 Similar Papers

No similar papers found.