FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ensuring reliable DNN inference in safety-critical applications remains challenging due to transient and permanent faults in hardware accelerators. Method: This paper proposes a runtime-reconfigurable fault-tolerant systolic array architecture. It introduces fault propagation modeling and reliability-driven layer mapping to support heterogeneous layer deployment and dynamic switching among three execution modes and four implementation options. A novel reliability assessment mechanism—grounded in fault propagation analysis—enables joint optimization of energy efficiency and fault tolerance. Contribution/Results: Compared to static redundancy, the design reduces hardware overhead by 6× while improving reliability by 2.5×. It provides full-path protection for PE registers and MAC units, and achieves up to 3× higher inference throughput. The architecture establishes an optimal trade-off among resource cost, throughput, and fault tolerance—without compromising functional correctness or safety requirements.

Technology Category

Application Category

📝 Abstract
The emergence of Deep Neural Networks (DNNs) in mission- and safety-critical applications brings their reliability to the front. High performance demands of DNNs require the use of specialized hardware accelerators. Systolic array architecture is widely used in DNN accelerators due to its parallelism and regular structure. This work presents a run-time reconfigurable systolic array architecture with three execution modes and four implementation options. All four implementations are evaluated in terms of resource utilization, throughput, and fault tolerance improvement. The proposed architecture is used for reliability enhancement of DNN inference on systolic array through heterogeneous mapping of different network layers to different execution modes. The approach is supported by a novel reliability assessment method based on fault propagation analysis. It is used for the exploration of the appropriate execution mode-layer mapping for DNN inference. The proposed architecture efficiently protects registers and MAC units of systolic array PEs from transient and permanent faults. The reconfigurability feature enables a speedup of up to $3 imes$, depending on layer vulnerability. Furthermore, it requires $6 imes$ less resources compared to static redundancy and $2.5 imes$ less resources compared to the previously proposed solution for transient faults.
Problem

Research questions and friction points this paper is trying to address.

Enhance reliability of DNN inference on systolic arrays.
Propose reconfigurable systolic array with fault tolerance.
Optimize resource use and throughput for DNN accelerators.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Run-time reconfigurable systolic array architecture
Heterogeneous mapping for reliability enhancement
Novel reliability assessment method
🔎 Similar Papers
No similar papers found.