🤖 AI Summary
This work addresses the challenge of evaluating soft error resilience in Vision Transformers (ViTs) for safety-critical applications, where exhaustive fault injection is infeasible due to their massive parameter count. For the first time, finite population sampling theory is introduced into ViT soft error analysis, yielding a statistical fault injection framework that reliably estimates model robustness with only a few thousand samples—achieving 99% confidence and ±1% margin of error while reducing testing cost by up to 10,700×. The approach reveals that as few as 3% of FP32 bit flips can trigger catastrophic accuracy degradation and precisely identifies normalization layers and IEEE-754 floating-point exponent bits as the most vulnerable components.
📝 Abstract
With the growth of Vision Transformers in safety-critical domains like autonomous systems and medical imaging, ensuring their reliability against soft errors is paramount. While ViTs offer state-of-the-art accuracy, their massive parameter counts render exhaustive fault injection campaigns infeasible. To bridge this gap, a statistical fault injection framework is presented, leveraging finite-population sampling theory to provide formal reliability guarantees. It is demonstrated that failure rates are bounded within a 1% margin at 99\% confidence using only a few thousand samples, regardless of model scale. This methodology achieves up to a 10,700 times reduction in experimental cost compared to exhaustive approaches, while preserving the ability to localize vulnerabilities across architectural components. Through extensive evaluation of different architectures like ViT-Tiny and ViT-Small, a highly non-uniform reliability landscape is uncovered. It is shown that while only 3% of FP32 bit-flips result in failure, the vast majority of these events lead to catastrophic accuracy collapse. Specific vulnerabilities are localized to normalization layers and critical exponent bits within the IEEE-754 format, providing a mathematical foundation and actionable insights for the design of hardened, edge-deployed ViT architectures.