LEFT-RS: A Lock-Free Fault-Tolerant Resource Sharing Protocol for Multicore Real-Time Systems

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multicore real-time systems, conventional locking mechanisms fail to tolerate transient faults within critical sections, leading to cross-task error propagation. This work proposes a lock-free, fault-aware resource sharing protocol that leverages parallel replica execution and fault-state–aware scheduling to enable concurrent critical-section entry, dynamic access termination or early completion, and formal worst-case response time (WCRT) analysis. Its core contribution is the first lock-free fault-tolerant sharing paradigm, jointly optimizing fault isolation and real-time guarantees. Experimental evaluation demonstrates an average 84.5% improvement in schedulability, significant reduction in blocking overhead, and a 62% decrease in fault recovery latency—achieving strict 100% timing compliance under representative multicore real-time workloads.

Technology Category

Application Category

📝 Abstract
Emerging real-time applications have driven the transition to multicore embedded systems, where tasks must share resources due to functional demands and limited availability. These resources, whether local or global, are protected within critical sections to prevent race conditions, with locking protocols ensuring both exclusive access and timing requirements. However, transient faults occurring within critical sections can disrupt execution and propagate errors across multiple tasks. Conventional locking protocols fail to address such faults, and integrating traditional fault tolerance techniques often increases blocking. Recent approaches improve fault recovery through parallel replica execution; however, challenges remain due to sequential accessing, coordination overhead, and susceptibility to common-mode faults. In this paper, we propose a Lock-frEe Fault-Tolerant Resource Sharing (LEFT-RS) protocol for multicore real-time systems. LEFT-RS allows tasks to concurrently access and read global resources while entering their critical sections in parallel. Each task can complete its access earlier upon successful execution if other tasks experience faults, thereby improving the efficiency of resource usage. Our design also limits the overhead and enhances fault resilience. We present a comprehensive worst-case response time analysis to ensure timing guarantees. Extensive evaluation results demonstrate that our method significantly outperforms existing approaches, achieving up to an 84.5% improvement in schedulability on average.
Problem

Research questions and friction points this paper is trying to address.

Enables concurrent fault-tolerant resource sharing in multicore real-time systems
Addresses transient faults in critical sections without increasing blocking
Ensures timing guarantees through worst-case response time analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lock-free protocol for concurrent resource access
Parallel critical section entry with fault tolerance
Worst-case response time analysis for timing guarantees
🔎 Similar Papers
No similar papers found.
N
Nan Chen
University of York, UK
X
Xiaotian Dai
University of York, UK
T
Tong Cheng
Sun Yat-sen University, China
Alan Burns
Alan Burns
University of York, UK
Real-Time Systemsschedulingprogramming languages
Iain Bate
Iain Bate
Real-Time Systems Group (RTSRG), University of York
Real-time systemsDependable SystemsWireless Sensor NetworksScheduling and Timing Analysis
S
Shuai Zhao
Sun Yat-sen University, China