🤖 AI Summary
This study addresses the threat posed by single-event upsets (SEUs) to physics-informed Real NVP normalizing flow-based anomaly detection models deployed on orbiting satellites, representing the first systematic radiation resilience assessment of such AI models.
Method: We propose a hierarchical state/output dual-path fault injection framework, implemented within a custom TensorFlow environment, enabling multi-granularity (zero-value, random-value, and bit-flip) injection into weights, biases, and activation values.
Contribution/Results: Our analysis reveals nonlinear performance degradation patterns triggered by critical-bit flips, identifies radiation-sensitive layers and vulnerable bit positions, and quantifies robustness decay across varying injection locations, fault types, and intensities. The findings provide reproducible empirical evidence and methodological support for fault-tolerant design and high-reliability deployment of onboard AI models.
📝 Abstract
Satellites are used for a multitude of applications, including communications, Earth observation, and space science. Neural networks and deep learning-based approaches now represent the state-of-the-art to enhance the performance and efficiency of these tasks. Given that satellites are susceptible to various faults, one critical application of Artificial Intelligence (AI) is fault detection. However, despite the advantages of neural networks, these systems are vulnerable to radiation errors, which can significantly impact their reliability. Ensuring the dependability of these solutions requires extensive testing and validation, particularly using fault injection methods. This study analyses a physics-informed (PI) real-valued non-volume preserving (Real NVP) normalizing flow model for fault detection in space systems, with a focus on resilience to Single-Event Upsets (SEUs). We present a customized fault injection framework in TensorFlow to assess neural network resilience. Fault injections are applied through two primary methods: Layer State injection, targeting internal network components such as weights and biases, and Layer Output injection, which modifies layer outputs across various activations. Fault types include zeros, random values, and bit-flip operations, applied at varying levels and across different network layers. Our findings reveal several critical insights, such as the significance of bit-flip errors in critical bits, that can lead to substantial performance degradation or even system failure. With this work, we aim to exhaustively study the resilience of Real NVP models against errors due to radiation, providing a means to guide the implementation of fault tolerance measures.