๐ค AI Summary
This work addresses the challenge of effectively integrating graph structure and node attributes for unsupervised clustering in attributed graphs. The authors propose a multi-round self-training framework based on graph neural networks that alternately refines node representations and cluster assignments in an unsupervised manner. In each round, the current clustering result is used to reconstruct the graph structure, which is then fused with the original graph to form a context-aware graph for generating improved representations. The key innovation lies in the dynamic, synergistic utilization of both edge structure and node attributes, overcoming limitations of conventional single-round training or reliance on a single information source. Experiments demonstrate that the method significantly outperforms baselines using only structure or attributes on synthetic data, that multi-round learning surpasses extended single-round training, and that it achieves state-of-the-art performance on real-world datasets under balanced clustering scenarios.
๐ Abstract
Graph clustering - partitioning the node set of a graph into disjoint subsets that reflect some latent information - is a fundamental problem as it finds applications in a myriad of different scenarios. While this classic problem has been tackled for decades by different communities, a recent variation of the problem driven by real data considers the scenario where nodes have attributes that are also informative. This has triggered novel methods that simultaneously leverage network information (edges) and node information (attributed) in the design of novel clustering algorithms. This work proposes a novel framework that builds on prior works that have applied graph neural networks (GNN) to graph clustering. The proposed framework operates in rounds of self learning in a fully unsupervised setting. In each round, a GNN generates representations for nodes that are used to cluster the nodes. This clustering influences the graph used to generate the node representation in the next round. Moreover, a context graph built in each round using the original graph is used to generate the node representations. Empirical results show that the proposed methodology extracts information from both network edges and node attributes in synthetic data, outperforming algorithms focused solely on the network or attributes when neither are very informative. Multiple rounds of learning also improve the performance and always outperforms a long single round of training (i.e., classic GNN graph clustering). When considering real datasets, empirical results indicate that the proposed methodology is competitive to state-of-the-art methods when cluster sizes are balanced.