🤖 AI Summary
This study addresses the problem of exact community recovery in networks with node-side information, such as attributes or labels, under the Data Block Model (DBM). By introducing the Chernoff–TV divergence, the work establishes—for the first time—a sharp information-theoretic threshold for exact recovery in the DBM, proving that recovery is fundamentally impossible below this threshold. Building on this theoretical foundation, the authors develop an efficient community detection algorithm that achieves this optimal limit. Extensive simulations demonstrate that incorporating node-side data significantly enhances community detection performance, thereby validating the theoretical predictions and highlighting the practical benefits of leveraging auxiliary node information in network inference tasks.
📝 Abstract
Community detection in networks is a fundamental problem in machine learning and statistical inference, with applications in social networks, biological systems, and communication networks. The stochastic block model (SBM) serves as a canonical framework for studying community structure, and exact recovery, identifying the true communities with high probability, is a central theoretical question. While classical results characterize the phase transition for exact recovery based solely on graph connectivity, many real-world networks contain additional data, such as node attributes or labels. In this work, we study exact recovery in the Data Block Model (DBM), an SBM augmented with node-associated data, as formalized by Asadi, Abbe, and Verd\'{u} (2017). We introduce the Chernoff--TV divergence and use it to characterize a sharp exact recovery threshold for the DBM. We further provide an efficient algorithm that achieves this threshold, along with a matching converse result showing impossibility below the threshold. Finally, simulations validate our findings and demonstrate the benefits of incorporating vertex data as side information in community detection.