ENFOR-SA: End-to-end Cross-layer Transient Fault Injector for Efficient and Accurate DNN Reliability Assessment on Systolic Arrays

๐Ÿ“… 2026-01-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of balancing accuracy and efficiency in evaluating the reliability of systolic-array-based deep neural networks (DNNs) under transient faults. The authors propose an end-to-end cross-layer transient fault injection framework that employs an RTL-level systolic array model exclusively during the fault injection phase, while offloading all other computations to the software layer. This approach demonstrates for the first time that RTL-level accuracy can be preserved without requiring full-cycle, high-overhead RTL simulation. By leveraging a two-stage cross-layer simulation methodology, the framework achieves substantial performance gains: experiments show an average speedup of 569ร— over full SoC RTL simulation and 2.03ร— over state-of-the-art cross-layer tools, with only a 6% slowdown compared to pure software-based injectionโ€”all while maintaining hardware-level fidelity in fault impact assessment.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in deep learning have produced highly accurate but increasingly large and complex DNNs, making traditional fault-injection techniques impractical. Accurate fault analysis requires RTL-accurate hardware models. However, this significantly slows evaluation compared with software-only approaches, particularly when combined with expensive HDL instrumentation. In this work, we show that such high-overhead methods are unnecessary for systolic array (SA) architectures and propose ENFOR-SA, an end-to-end framework for DNN transient fault analysis on SAs. Our two-step approach employs cross-layer simulation and uses RTL SA components only during fault injection, with the rest executed at the software level. Experiments on CNNs and Vision Transformers demonstrate that ENFOR-SA achieves RTL-accurate fault injection with only 6% average slowdown compared to software-based injection, while delivering at least two orders of magnitude speedup (average $569\times$) over full-SoC RTL simulation and a $2.03\times$ improvement over a state-of-the-art cross-layer RTL injection tool. ENFOR-SA code is publicly available at https://github.com/rafaabt/ENFOR-SA.
Problem

Research questions and friction points this paper is trying to address.

transient fault injection
DNN reliability
systolic array
RTL simulation
fault analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

systolic array
transient fault injection
cross-layer simulation
DNN reliability
RTL-accurate
๐Ÿ”Ž Similar Papers
No similar papers found.