RACS and SADL: Towards Robust SMR in the Wide-Area Network

📅 2024-04-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address frequent view changes and performance degradation in leader-based consensus protocols (e.g., Raft) under adversarial disruptions—such as network partitions and high packet loss—in wide-area cloud networks, this paper proposes RACS, the first Raft-compatible and asynchronously robust randomized consensus protocol, along with its optimized variant SADL-RACS. Key innovations include Raft-inspired randomized leader election, asynchronous-safe log replication, adaptive failure detection, and lightweight view switching. Under asynchronous adversarial conditions, RACS achieves 28k commands/sec—10× higher than Raft/Multi-Paxos—while reaching 200k commands/sec in synchronous settings. SADL-RACS further improves throughput to 500k/196k commands/sec in synchronous/asynchronous scenarios, respectively, achieving optimal 1-RTT latency in synchrony. This work is the first to unify asynchronous robustness and high performance while preserving full Raft API compatibility.

Technology Category

Application Category

📝 Abstract
Widely deployed consensus protocols in the cloud are often leader-based and optimized for low latency under synchronous network conditions. However, cloud networks can experience disruptions such as network partitions, high-loss links, and configuration errors. These disruptions interfere with the operation of leader-based protocols, as their view change mechanisms interrupt the normal case replication and cause the system to stall. This paper proposes RACS, a novel randomized consensus protocol that ensures robustness against adversarial network conditions. RACS achieves optimal one-round trip latency under synchronous network conditions while remaining resilient to adversarial network conditions. RACS follows a simple design inspired by Raft, the most widely used consensus protocol in the cloud, and therefore enables seamless integration with the existing cloud software stack -- a goal no previous asynchronous protocol has successfully achieved. Experiments with a prototype deployed on Amazon EC2 confirm that RACS achieves a throughput of 28k cmd/sec under adversarial cloud network conditions, whereas existing leader-based protocols such as Multi-Paxos and Raft provide less than 2.8k cmd/sec. Under synchronous network conditions, RACS matches the performance of Multi-Paxos and Raft, achieving a throughput of 200k cmd/sec with a latency of 300ms, confirming that RACS introduces no unnecessary overhead. Finally, SADL-RACS-an optimized version of RACS designed for high performance and robustness-achieves an impressive throughput of 500k cmd/sec under synchronous network conditions and 196k cmd/sec under adversarial network conditions, further enhancing both performance and robustness.
Problem

Research questions and friction points this paper is trying to address.

Enhances consensus protocol robustness in cloud networks
Optimizes latency under synchronous network conditions
Improves throughput in adversarial network scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Randomized consensus protocol RACS
Resilient to adversarial network conditions
Seamless integration with cloud software
🔎 Similar Papers
No similar papers found.
P
Pasindu Tennage
EPFL and ISTA
A
Antoine Desjardins
ISTA
L
Lefteris Kokoris-Kogias
ISTA and Mysten Labs