🤖 AI Summary
To address frequent view changes and performance degradation in leader-based consensus protocols (e.g., Raft) under adversarial disruptions—such as network partitions and high packet loss—in wide-area cloud networks, this paper proposes RACS, the first Raft-compatible and asynchronously robust randomized consensus protocol, along with its optimized variant SADL-RACS. Key innovations include Raft-inspired randomized leader election, asynchronous-safe log replication, adaptive failure detection, and lightweight view switching. Under asynchronous adversarial conditions, RACS achieves 28k commands/sec—10× higher than Raft/Multi-Paxos—while reaching 200k commands/sec in synchronous settings. SADL-RACS further improves throughput to 500k/196k commands/sec in synchronous/asynchronous scenarios, respectively, achieving optimal 1-RTT latency in synchrony. This work is the first to unify asynchronous robustness and high performance while preserving full Raft API compatibility.
📝 Abstract
Widely deployed consensus protocols in the cloud are often leader-based and optimized for low latency under synchronous network conditions. However, cloud networks can experience disruptions such as network partitions, high-loss links, and configuration errors. These disruptions interfere with the operation of leader-based protocols, as their view change mechanisms interrupt the normal case replication and cause the system to stall. This paper proposes RACS, a novel randomized consensus protocol that ensures robustness against adversarial network conditions. RACS achieves optimal one-round trip latency under synchronous network conditions while remaining resilient to adversarial network conditions. RACS follows a simple design inspired by Raft, the most widely used consensus protocol in the cloud, and therefore enables seamless integration with the existing cloud software stack -- a goal no previous asynchronous protocol has successfully achieved. Experiments with a prototype deployed on Amazon EC2 confirm that RACS achieves a throughput of 28k cmd/sec under adversarial cloud network conditions, whereas existing leader-based protocols such as Multi-Paxos and Raft provide less than 2.8k cmd/sec. Under synchronous network conditions, RACS matches the performance of Multi-Paxos and Raft, achieving a throughput of 200k cmd/sec with a latency of 300ms, confirming that RACS introduces no unnecessary overhead. Finally, SADL-RACS-an optimized version of RACS designed for high performance and robustness-achieves an impressive throughput of 500k cmd/sec under synchronous network conditions and 196k cmd/sec under adversarial network conditions, further enhancing both performance and robustness.