🤖 AI Summary
This work addresses the problem of recovering community labels in randomly grown networks that contain only topological structure and whose generative mechanism is unknown. The authors propose a two-stage approach: first inferring the community memberships of central nodes—identified via degree or arrival time—and then propagating these labels to the remaining nodes. The method applies to non-stochastic block model (non-SBM) networks composed of a preferential attachment tree augmented with Erdős–Rényi random edges. It achieves, for the first time, provably consistent recovery of communities for a subset of high-centrality nodes in Markovian random growth networks. Theoretical analysis demonstrates that exact recovery over the entire graph is information-theoretically impossible, yet accurate recovery is attainable for nodes with sufficiently high centrality. Empirical validation on synthetic benchmarks and real-world coauthorship networks confirms the efficacy of the proposed approach.
📝 Abstract
We study community detection on Markovian random networks outside of the Stochastic Block Model (SBM) framework. Specifically, we consider a random network growth process which generates $K$ separate preferential attachment trees and connects them with Erdős--Rényi edges, so that each tree represents a community and each node inherits the label of the tree to which it belongs. This model is able to produce many features of real world networks that are improbable under SBM, such as power law degree distribution and the existence of chains and hubs. Given only the final graph, without any knowledge of the growth process, we seek to recover the unobserved community membership of the nodes. We first prove that it is impossible for any algorithm to consistently recover the community label of all the nodes. However, we design algorithms which are provably able to recover the community labels of subsets of central nodes, for several different notions of node centrality such as arrival time or degree. Our procedure consists of two stages where, in the first stage, we classify high degree nodes and then, in the second stage, extend the community assignments to the remaining vertices. Numerical experiments and a real data application on a coauthorship network demonstrate the effectiveness of our proposed approach.