GapFuzz: Cross-Plane Divergence Fuzzing for Distributed SDN Controllers

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of state inconsistency between the control and data planes in distributed SDN controllers caused by concurrent race conditions during asynchronous flow-table replication—a class of cross-plane divergence that existing fuzzing tools struggle to detect effectively. The paper proposes a state-aware concurrent fuzzing approach that, for the first time, extends fuzzing into the joint state space of the control plane and kernel-based data plane. By injecting northbound requests with controllable delays on non-leader nodes, leveraging ovs-appctl for cross-plane state tracing, employing a two-phase timing search (exponential backoff followed by binary search), and incorporating a state-classification mechanism derived from ONOS source code, the method precisely identifies and bounds persistent divergence windows. Experiments on a three-node ONOS 2.7 cluster show that 81.7% of tests triggered divergences, 99.4% of which persisted beyond 30 seconds, yielding a 26.6-percentage-point improvement in detection rate over approaches relying solely on userspace OpenFlow probes.
📝 Abstract
Distributed Software-Defined Networking (SDN) clusters replicate flow state asynchronously between a master node and its backups, leaving a window during which two backup nodes can each commit a contradictory rule, the master can serialize both into the data plane, and the kernel datapath can latch onto an action that no node believes authoritative. Existing SDN fuzzers miss this fault: they confine their oracle to the control plane, target a single controller, or do not steer concurrency to provoke replication races. We present GapFuzz, a stateful concurrency fuzzer for distributed SDN clusters. GapFuzz injects pairs of contradictory Northbound requests on two non-master nodes with controlled inter-injection delay $Δt$, and reconstructs the global cross-plane state by querying every replica and the kernel-datapath action through ovs-appctl ofproto/trace. A two-phase timing search detects whether a divergence exists, then doubles and bisects on $Δt$ to bound the injection-time window; a lifetime probe labels each verdict transient or persistent and assigns it to one of four cross-plane state classes derived from the ONOS 2.7 source. On a three-node ONOS 2.7 cluster, GapFuzz produces a divergent verdict in 81.7% of attempts ($N=50$, Wilson 95% CI $[77.3, 85.4]$%); every divergence sits between the cluster's authoritative state and the kernel datapath. Phase 2 separates a 5 ms race window for one template from a doubling-cap regime ($Δt_{\max}=10.24$ s) for six others, and 99.4% of divergences persist past 30 s. Replacing the kernel-datapath probe with the OpenFlow user-space probe used by prior fuzzers drops detection by 26.6 percentage points overall and by 46.5 points after excluding canonicalization-forced verdicts.
Problem

Research questions and friction points this paper is trying to address.

Distributed SDN
State divergence
Replication race
Control-data plane inconsistency
Concurrency bug
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-plane divergence
distributed SDN fuzzing
concurrency race
stateful fuzzing
kernel datapath probing
🔎 Similar Papers
No similar papers found.