🤖 AI Summary
This paper studies correlation clustering over dynamic graph streams, aiming to minimize the number of disagreements—i.e., pairs of similar vertices separated into different clusters or dissimilar vertices merged into the same cluster. Existing algorithms achieve only a 5-approximation in a single pass, or require multiple passes for better approximation ratios. We propose the first semi-streaming algorithm that attains a $(3+varepsilon)$-approximation using a single pass over the stream, with polynomial-time post-processing. Our method builds upon the PIVOT analysis framework and integrates graph sparsification techniques, enabling native support for both edge insertions and deletions under the general dynamic graph stream model. Compared to prior work, our algorithm achieves substantial improvements in approximation accuracy, streaming efficiency (single-pass guarantee), and dynamic adaptability. It provides rigorous theoretical guarantees while maintaining practical robustness, bridging a key gap between theory and applicability in dynamic correlation clustering.
📝 Abstract
Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming algorithms for correlation clustering, where each pair of elements is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements that get separated. Our main contribution is a semi-streaming algorithm that achieves a $(3 + varepsilon)$-approximation to the minimum number of disagreements using a single pass over the stream. In addition, the algorithm also works for dynamic streams. Our approach builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a $3$-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable in models such as semi-streaming, where sparse graphs can typically be handled much more efficiently. Our work improves on the approximation ratio of the recent single-pass $5$-approximation algorithm and on the number of passes of the recent $O(1/varepsilon)$-pass $(3 + varepsilon)$-approximation algorithm [Behnezhad, Charikar, Ma, Tan FOCS'22, SODA'23]. Our algorithm is also more robust and can be applied in dynamic streams. Furthermore, it is the first single pass $(3 + varepsilon)$-approximation algorithm that uses polynomial post-processing time.