Coreset Spectral Clustering

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address the inefficiency and susceptibility to local optima of spectral clustering on large-scale sparse graphs with high numbers of clusters, this paper introduces coreset techniques into the spectral clustering framework for the first time, proposing CoreSet Spectral Clustering (CSSC). Leveraging the equivalence between kernel k-means and normalized cut, CSSC constructs a coreset graph, solves the normalized cut on it, and transfers the resulting cluster assignments back to the original graph. Theoretically, an α-approximate solution on the coreset guarantees an O(α)-approximate solution on the original graph. Algorithmically, computational complexity is reduced from Õ(nk) to Õ(n·min{k, d_avg}), significantly accelerating sparse kernel matrix computation. Experiments on large-scale real-world graphs with multiple clusters demonstrate that CSSC achieves both high efficiency and robustness while effectively mitigating local optima.

Technology Category

Application Category

📝 Abstract

Coresets have become an invaluable tool for solving $k$-means and kernel $k$-means clustering problems on large datasets with small numbers of clusters. On the other hand, spectral clustering works well on sparse graphs and has recently been extended to scale efficiently to large numbers of clusters. We exploit the connection between kernel $k$-means and the normalised cut problem to combine the benefits of both. Our main result is a coreset spectral clustering algorithm for graphs that clusters a coreset graph to infer a good labelling of the original graph. We prove that an $alpha$-approximation for the normalised cut problem on the coreset graph is an $O(alpha)$-approximation on the original. We also improve the running time of the state-of-the-art coreset algorithm for kernel $k$-means on sparse kernels, from $ ilde{O}(nk)$ to $ ilde{O}(ncdot min {k, d_{avg}})$, where $d_{avg}$ is the average number of non-zero entries in each row of the $n imes n$ kernel matrix. Our experiments confirm our coreset algorithm is asymptotically faster on large real-world graphs with many clusters, and show that our clustering algorithm overcomes the main challenge faced by coreset kernel $k$-means on sparse kernels which is getting stuck in local optima.

Problem

Research questions and friction points this paper is trying to address.

Combines coreset and spectral clustering for large datasets.

Improves efficiency of kernel k-means on sparse kernels.

Addresses local optima in coreset kernel k-means clustering.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines coreset and spectral clustering techniques

Improves kernel k-means running time efficiency

Ensures approximation quality on original graphs

🔎 Similar Papers

No similar papers found.