🤖 AI Summary
This paper investigates the recoverability of large-scale, highly heterogeneous communities—particularly those following a power-law size distribution—in the Planted Partition Model (PPM). When the number of communities is arbitrarily large and their sizes are severely imbalanced, conventional accuracy- or alignment-based evaluation metrics become inadequate. To address this, the authors replace such metrics with the correlation coefficient, enabling a unified formalization of exact, approximate, and weak recovery. They propose Diamond Percolation, a novel percolation algorithm based on common neighbors. Theoretically, this work provides the first rigorous recovery guarantees for power-law-sized communities under mild edge probability assumptions, ensuring reliable recovery for an arbitrary number of communities across vastly different scales. This significantly enhances the model’s capacity to capture multiscale structures prevalent in real-world networks.
📝 Abstract
We analyze community recovery in the planted partition model (PPM) in regimes where the number of communities is arbitrarily large. We examine the three standard recovery regimes: exact recovery, almost exact recovery, and weak recovery. When communities vary in size, traditional accuracy- or alignment-based metrics become unsuitable for assessing the correctness of a predicted partition. To address this, we redefine these recovery regimes using the correlation coefficient, a more versatile metric for comparing partitions. We then demonstrate that emph{Diamond Percolation}, an algorithm based on common-neighbors, successfully recovers communities under mild assumptions on edge probabilities, with minimal restrictions on the number and sizes of communities. As a key application, we consider the case where community sizes follow a power-law distribution, a characteristic frequently found in real-world networks. To the best of our knowledge, we provide the first recovery results for such unbalanced partitions.