DeepVigor+: Scalable and Accurate Semi-Analytical Fault Resilience Analysis for Deep Neural Network

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional fault injection (FI) is computationally prohibitive for reliability assessment of deep neural network (DNN) hardware in safety-critical applications, while statistical FI (SFI) suffers from poor scalability—especially for large models. To address this, we propose an efficient semi-analytical method that pioneers a unified framework integrating fault propagation modeling with optimal vulnerability factor (VF) estimation. By synergistically combining lightweight sampling and analytical derivation, our approach achieves VF estimation error below 1% while reducing simulation overhead by 14.9–26.9× compared to SFI. The method enables minute-scale reliability analysis for mainstream DNNs and, for the first time, supports real-time VF evaluation for models with tens of millions of parameters. This breakthrough overcomes SFI’s computational bottlenecks on large-scale models, establishing a scalable, high-accuracy paradigm for verifying high-reliability AI hardware.

Technology Category

Application Category

📝 Abstract
Growing exploitation of Machine Learning (ML) in safety-critical applications necessitates rigorous safety analysis. Hardware reliability assessment is a major concern with respect to measuring the level of safety. Quantifying the reliability of emerging ML models, including Deep Neural Networks (DNNs), is highly complex due to their enormous size in terms of the number of parameters and computations. Conventionally, Fault Injection (FI) is applied to perform a reliability measurement. However, performing FI on modern-day DNNs is prohibitively time-consuming if an acceptable confidence level is to be achieved. In order to speed up FI for large DNNs, statistical FI has been proposed. However, the run-time for the large DNN models is still considerably long. In this work, we introduce DeepVigor+, a scalable, fast and accurate semi-analytical method as an efficient alternative for reliability measurement in DNNs. DeepVigor+ implements a fault propagation analysis model and attempts to acquire Vulnerability Factors (VFs) as reliability metrics in an optimal way. The results indicate that DeepVigor+ obtains VFs for DNN models with an error less than 1% and 14.9 up to 26.9 times fewer simulations than the best-known state-of-the-art statistical FI enabling an accurate reliability analysis for emerging DNNs within a few minutes.
Problem

Research questions and friction points this paper is trying to address.

Scalable and accurate resilience analysis for deep neural networks
Efficient fault injection alternative for large CNN reliability assessment
Fast vulnerability factor calculation with minimal simulation requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-analytical method for CNN reliability analysis
Fault propagation model for vulnerability factor acquisition
Significantly reduces simulations while maintaining accuracy
🔎 Similar Papers
No similar papers found.
M
Mohammad Hasan Ahmadilivani
Tallinn University of Technology, Tallinn, Estonia
J
J. Raik
Tallinn University of Technology, Tallinn, Estonia
Masoud Daneshtalab
Masoud Daneshtalab
Professor and Head of DeepHERO Lab.
Deep LearningHeterogeneous and Dependable ComputingInterconnection Networks
M
M. Jenihhin
Tallinn University of Technology, Tallinn, Estonia