In-Situ Hardware Error Detection Using Specification-Derived Petri Net Models and Behavior-Derived State Sequences

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hardware accelerators deployed in data centers and safety-critical systems are vulnerable to control-flow soft errors, leading to silent data corruption and system failure. To address this, we propose two synergistic online control-flow error detection techniques: (1) specification-derived Petri net modeling and (2) behavior-derived state sequence comparison. Our approach is the first to jointly leverage specification-level formal modeling and runtime dynamic behavioral analysis for fine-grained anomaly detection. It supports flexible configuration under area constraints, balancing high fault coverage with low overhead. Evaluated on four RTL designs—convolution, Gaussian blur, AES encryption, and NoC router—the method achieves 48%–100% fault detection rates for bit-flips in control registers and master control inputs, with only 0.5%–10% area overhead.

Technology Category

Application Category

📝 Abstract
In hardware accelerators used in data centers and safety-critical applications, soft errors and resultant silent data corruption significantly compromise reliability, particularly when upsets occur in control-flow operations, leading to severe failures. To address this, we introduce two methods for monitoring control flows: using specification-derived Petri nets and using behavior-derived state transitions. We validated our method across four designs: convolutional layer operation, Gaussian blur, AES encryption, and a router in Network-on-Chip. Our fault injection campaign targeting the control registers and primary control inputs demonstrated high error detection rates in both datapath and control logic. Synthesis results show that a maximum detection rate is achieved with a few to around 10% area overhead in most cases. The proposed detectors quickly detect 48% to 100% of failures resulting from upsets in internal control registers and perturbations in primary control inputs. The two proposed methods were compared in terms of area overhead and error detection rate. By selectively applying these two methods, a wide range of area constraints can be accommodated, enabling practical implementation and effectively enhancing error detection capabilities.
Problem

Research questions and friction points this paper is trying to address.

Detect hardware errors in control-flow operations
Monitor control flows using Petri nets and state transitions
Achieve high error detection with low area overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Specification-derived Petri nets for error detection
Behavior-derived state transitions monitoring
Low area overhead with high detection rates
🔎 Similar Papers
No similar papers found.
T
Tomonari Tanaka
Department of Communications and Computer Engineering, Kyoto University, Kyoto, 606-8501, Japan
T
T. Uezono
Production Engineering and MONOZUKURI Innovation Center, Center for Sustainability, Research and Development Group, Hitachi, Ltd., Yokohama 244-0817, Japan
K
Kohei Suenaga
Department of Communications and Computer Engineering, Kyoto University, Kyoto, 606-8501, Japan
Masanori Hashimoto
Masanori Hashimoto
Kyoto University
VLSIEDAReconfigurable architecture