Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering

📅 2024-04-14
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of concept drift and cluster-size imbalance in streaming data clustering—which degrade both accuracy and computational efficiency—this paper proposes the Self-Growing Map (SGM) model and the SOHI hierarchical merging strategy. SGM constructs a density-sensitive, on-demand growing neural topology that robustly identifies small clusters and incrementally adapts to evolving cluster structures. SOHI performs hierarchical cluster merging via local neighborhood retrieval, eliminating costly global search while preserving clustering accuracy and substantially improving runtime efficiency. The method integrates density-driven neuron allocation, SGM-guided incremental learning, and localized retrieval. Evaluated on multiple imbalanced streaming datasets, it achieves an average 12.7% improvement in clustering accuracy, a 3.2× speedup in execution time, and stable estimation of the true number of clusters—effectively mitigating small-cluster omission.

Technology Category

Application Category

📝 Abstract
Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encounter the dynamic cluster imbalance issue. That is, the imbalance ratio of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, we propose an accurate and efficient streaming data clustering approach to adapt the drifting and imbalanced cluster distributions. We first design a Self-Growth Map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intra-cluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the Self-grOwth map-guided Hierarchical merging for Imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data.
Problem

Research questions and friction points this paper is trying to address.

Address dynamic cluster imbalance in streaming data clustering
Adapt to drifting and imbalanced cluster distributions efficiently
Accurately identify small clusters in imbalanced data streams
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Growth Map adapts dynamically to data chunks
Density-sensitive neurons prevent missing small clusters
Hierarchical merging strategy avoids global searching
🔎 Similar Papers
No similar papers found.
Y
Yiqun Zhang
Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China, and also with the School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
R
Rong Zou
Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China
Y
Yiu-ming Cheung
Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China
S
Sen Feng
Pengkai Wang
Pengkai Wang
Z
Zexi Tan
X
Xiaopeng Luo
Y
Yuzhu Ji