π€ AI Summary
Existing community detection methods often yield disconnected or weakly connected clusters on large-scale graphs, compromising interpretability and robustness; while improved algorithms such as Weakly Connected Components (WCC) and Connected Modularity (CM) enhance connectivity, their prohibitive computational overhead limits scalability. This paper introduces highly optimized, parallel WCC and CM algorithms implemented in HPE Chapel, enabling sub-minute connected community detection on graphs with over two billion edgesβthe first such achievement. The algorithms feature deep optimizations in inter-process communication and load balancing, and are fully integrated into the Arkouda/Arachne framework to leverage modern multi-core architectures. On a 128-core system, our approach performs end-to-end connected clustering on the full OpenAlex graph (2B edges), achieving order-of-magnitude speedup over prior methods. This breakthrough significantly advances the scalability frontier for high-quality, strongly connected community detection in massive graphs.
π Abstract
Community detection plays a central role in uncovering meso scale structures in networks. However, existing methods often suffer from disconnected or weakly connected clusters, undermining interpretability and robustness. Well-Connected Clusters (WCC) and Connectivity Modifier (CM) algorithms are post-processing techniques that improve the accuracy of many clustering methods. However, they are computationally prohibitive on massive graphs. In this work, we present optimized parallel implementations of WCC and CM using the HPE Chapel programming language. First, we design fast and efficient parallel algorithms that leverage Chapel's parallel constructs to achieve substantial performance improvements and scalability on modern multicore architectures. Second, we integrate this software into Arkouda/Arachne, an open-source, high-performance framework for large-scale graph analytics. Our implementations uniquely enable well-connected community detection on massive graphs with more than 2 billion edges, providing a practical solution for connectivity-preserving clustering at web scale. For example, our implementations of WCC and CM enable community detection of the over 2-billion edge Open-Alex dataset in minutes using 128 cores, a result infeasible to compute previously.