🤖 AI Summary
Existing manifold clustering methods primarily focus on joint optimization of K-means and manifold learning, overlooking consistency between data geometric structure and cluster labels as well as class balance. To address this, we propose a label-guided manifold clustering framework. First, we explicitly incorporate label consistency into manifold graph construction, ensuring the learned manifold structure is aligned with final cluster assignments. Second, we introduce a Schatten *p*-norm maximization mechanism in the spectral domain to automatically achieve balanced cluster sizes. We provide theoretical guarantees on convergence and balance preservation. The framework supports arbitrary distance metrics and handles nonlinearly separable data. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of our structural-label consistency modeling and intrinsic class-balancing mechanism.
📝 Abstract
Manifold clustering, with its exceptional ability to capture complex data structures, holds a pivotal position in cluster analysis. However, existing methods often focus only on finding the optimal combination between K-means and manifold learning, and overlooking the consistency between the data structure and labels. To address this issue, we deeply explore the relationship between K-means and manifold learning, and on this basis, fuse them to develop a new clustering framework. Specifically, the algorithm uses labels to guide the manifold structure and perform clustering on it, which ensures the consistency between the data structure and labels. Furthermore, in order to naturally maintain the class balance in the clustering process, we maximize the Schatten p-norm of labels, and provide a theoretical proof to support this. Additionally, our clustering framework is designed to be flexible and compatible with many types of distance functions, which facilitates efficient processing of nonlinear separable data. The experimental results of several databases confirm the superiority of our proposed model.