CDC: A Simple Framework for Complex Data Clustering

📅 2024-03-06
🏛️ IEEE Transactions on Neural Networks and Learning Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised clustering of complex data—such as multi-view, non-Euclidean, and multi-relational structures—remains challenging due to the trade-off between modeling expressiveness and scalability. To address this, we propose CDC, a unified framework with linear time complexity. CDC integrates structural and attribute information via graph filtering, and introduces a similarity-preserving regularizer to adaptively learn high-quality anchor points, enabling efficient dimensionality reduction and scalable spectral clustering. It is the first framework to jointly handle diverse complex data types within a single architecture. We theoretically establish its cluster separability guarantee and demonstrate practical deployment on ultra-large graphs (up to 111M nodes). Extensive experiments show that CDC consistently outperforms state-of-the-art methods across multiple complex data benchmarks, while achieving O(n) time complexity for both training and inference and reducing memory consumption by 47%.

Technology Category

Application Category

📝 Abstract
In today's digital era driven by data, the amount and complexity of the collected data, such as multiview, non-Euclidean, and multirelational, are growing exponentially or even faster. Clustering, which unsupervisedly extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first use graph filtering (GF) to fuse geometric structure and attribute information. We then reduce complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving (SP) regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111 M.
Problem

Research questions and friction points this paper is trying to address.

Complex Data Clustering
Multi-perspective Data
Big Data Applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complex Data Clustering (CDC)
Graph Filtering Technique
Large-scale Dataset Processing
🔎 Similar Papers
2024-09-01arXiv.orgCitations: 4
Z
Zhao Kang
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Xuanting Xie
Xuanting Xie
University of Electronic Science and Technology of China
Graph Neural NetworksClustering
B
Bingheng Li
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
E
Erlin Pan
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China