A Method for Handling Negative Similarities in Explainable Graph Spectral Clustering of Text Documents -- Extended Version

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This paper addresses the problem that modern document embeddings (e.g., GloVe, doc2vec) induce negative pairwise similarities, which corrupt the construction of the normalized Laplacian matrix and degrade spectral clustering performance. We systematically analyze the underlying mechanisms by which such negative similarities impair graph-based clustering. To mitigate this issue, we propose a general similarity rectification framework comprising six strategies—including offset, truncation, and spectral shift—enabling, for the first time, the adaptation of word-vector-space interpretability methods to global embeddings like GloVe. Experiments demonstrate that our framework significantly improves the stability and success rate of normalized Laplacian spectral clustering with GloVe. Three rectification variants consistently enhance clustering accuracy across multiple benchmark datasets. Moreover, the framework extends the applicability of existing interpretability techniques to modern text embeddings, thereby strengthening both the robustness and interpretability of spectral clustering for textual documents.

Technology Category

Application Category

📝 Abstract

This paper investigates the problem of Graph Spectral Clustering with negative similarities, resulting from document embeddings different from the traditional Term Vector Space (like doc2vec, GloVe, etc.). Solutions for combinatorial Laplacians and normalized Laplacians are discussed. An experimental investigation shows the advantages and disadvantages of 6 different solutions proposed in the literature and in this research. The research demonstrates that GloVe embeddings frequently cause failures of normalized Laplacian based GSC due to negative similarities. Furthermore, application of methods curing similarity negativity leads to accuracy improvement for both combinatorial and normalized Laplacian based GSC. It also leads to applicability for GloVe embeddings of explanation methods developed originally bythe authors for Term Vector Space embeddings.

Problem

Research questions and friction points this paper is trying to address.

Handling negative similarities in graph spectral clustering

Comparing solutions for combinatorial and normalized Laplacians

Improving accuracy for GloVe embeddings in clustering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Handles negative similarities in spectral clustering

Compares six solutions for Laplacian-based clustering

Improves accuracy with negativity-curing methods

🔎 Similar Papers

Review of Explainable Graph-Based Recommender Systems