🤖 AI Summary
Existing substructure discovery methods for homogeneous multiplex networks suffer from poor scalability due to tight inter-layer coupling. Method: This paper proposes a decoupled cross-layer subgraph discovery framework: (i) connected subgraphs are independently mined per layer; (ii) results are fused via a distributed combinatorial algorithm; and (iii) a multilayer consistency verification mechanism ensures accuracy. Crucially, inter-layer computation is decoupled for the first time, enabling a linearly scalable MapReduce-based implementation. Results: Experiments on large-scale synthetic and real-world datasets demonstrate that our approach achieves an 8.2× speedup over conventional single-layer aggregation methods, while significantly reducing response latency. It simultaneously maintains high accuracy and strong scalability—addressing the long-standing trade-off between precision and efficiency in multiplex network analysis.
📝 Abstract
Graph mining analyzes real-world graphs to find core substructures (connected subgraphs) in applications modeled as graphs. Substructure discovery is a process that involves identifying meaningful patterns, structures, or components within a large data set. These substructures can be of various types, such as frequent patterns, motifs, or other relevant features within the data. To model complex data sets -- with multiple types of entities and relationships -- multilayer networks (or MLNs) have been shown to be more effective as compared to simple and attributed graphs. Analysis algorithms on MLNs using the decoupling approach have been shown to be both efficient and accurate. Hence, this paper focuses on substructure discovery in homogeneous multilayer networks (one type of MLN) using a novel decoupling-based approach. In this approach, each layer is processed independently, and then the results from two or more layers are composed to identify substructures in the entire MLN. The algorithm is designed and implemented, including the composition part, using one of the distributed processing frameworks (the Map/Reduce paradigm) to provide scalability. After establishing the correctness, we analyze the speedup and response time of the proposed algorithm and approach through extensive experimental analysis on large synthetic and real-world data sets with diverse graph characteristics.