Distributed Recoverable Sketches

📅 2025-11-07
🏛️ International Conference on Principles of Distributed Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address sketch data loss caused by switch failures in network monitoring, this paper proposes a distributed recoverable sketch framework supporting efficient fault-tolerant recovery for linear sketches used in frequency estimation (e.g., Count-Min Sketch). Methodologically, it introduces a dual-mode mechanism—periodic full-state synchronization and incremental updates—leveraging incremental encoding and batched change representation to enable plug-and-play integration of multiple sketch structures via an abstract API. Its key contribution is the first incorporation of distributed collaborative recovery into sketch data management, achieving a balanced trade-off among storage, computation, and communication overheads through modular design. Experiments demonstrate that, compared to full-state synchronization, the incremental strategy reduces communication volume by 62% and recovery latency by 47%, while preserving the theoretical error bounds of frequency estimates. This framework significantly enhances the reliability and scalability of large-scale network monitoring systems.

Technology Category

Application Category

📝 Abstract
Sketches are commonly used in computer systems and network monitoring tools to provide efficient query executions while maintaining a compact data representation. Switches and routers maintain sketches to track statistical characteristics of network traffic. The availability of such data is essential for the network analysis as a whole. Consequently, being able to recover sketches is critical after a switch crash. In this work, we explore how nodes in a network environment can cooperate to recover sketch data whenever any subset of them crashes. In particular, we focus on frequency estimation linear sketches, such as the Count-Min Sketch. We consider various approaches to ensure data reliability and explore the trade-offs between space consumption, runtime overheads, and traffic during recovery, which we point out as design guidelines. Besides different aspects of efficacy, we design a modular system for ease of maintenance and further scaling. A key aspect we examine is how the nodes update each other regarding their sketch content as it evolves over time. In particular, we compare periodic full updates vs incremental updates. We also examine several data structures to economically represent and encode a batch of latest changes. Our framework is generic, and other data structures can be plugged-in via an abstract API as long as they implement the corresponding API methods.
Problem

Research questions and friction points this paper is trying to address.

Recovering network sketch data after switch crashes in distributed systems
Exploring trade-offs between space consumption and recovery efficiency
Designing modular framework for sketch updates using incremental approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nodes cooperate to recover sketch data after crashes
System uses modular design for maintenance and scaling
Compares periodic versus incremental sketch update strategies
🔎 Similar Papers
No similar papers found.