🤖 AI Summary
This work addresses the lack of effective reliability guarantees in Block Floating-Point (BFP) neural network processors under hardware faults, where conventional end-to-end verification fails due to nonlinear block scaling. Through RTL-level fault injection, the study reveals heterogeneous vulnerabilities in BFP neural processing units (NPUs) at both bit-level and path-level granularities. To mitigate these issues, the paper proposes the first fault-tolerant microarchitecture tailored to BFP computation semantics, employing a row-column blocking strategy to decouple mantissa and exponent data paths and integrating an ultra-lightweight error detection and protection mechanism. The resulting design achieves reliability approaching that of dual modular redundancy while incurring only a 3.55% geometric mean performance overhead and less than 2% additional hardware cost.
📝 Abstract
Block Floating-Point (BFP) is emerging as an attractive data format for edge Neural Processing Units (NPUs), combining wide dynamic range with high hardware efficiency. However, its behavior under hardware faults and suitability for safety-critical deployments remain underexplored. Here, we present the first in-depth empirical reliability study of BFP-based NPUs. Using RTL-level fault injection on NPUs, our bit- and path-level analysis reveals pronounced heterogeneous vulnerabilities and shows conventional end-to-end check becomes ineffective under nonlinear block scaling. Guided by these insights, we design a fault-tolerant BFP-based NPU microarchitecture that aligns the BFP computational semantics with reliability constraints. The design uses a row/column-wise blocking strategy to decouple the fixed-point mantissa computations from the scalar exponent path, and introduces ultra-lightweight protection mechanisms for each. Experimental results demonstrate our design achieves near-dual modular redundancy reliability with only $3.55\%$ geometric mean performance overhead and less than $2\%$ hardware cost.