A Method for Handling Negative Similarities in Explainable Graph Spectral Clustering of Text Documents -- Extended Version

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem that modern document embeddings (e.g., GloVe, doc2vec) induce negative pairwise similarities, which corrupt the construction of the normalized Laplacian matrix and degrade spectral clustering performance. We systematically analyze the underlying mechanisms by which such negative similarities impair graph-based clustering. To mitigate this issue, we propose a general similarity rectification framework comprising six strategies—including offset, truncation, and spectral shift—enabling, for the first time, the adaptation of word-vector-space interpretability methods to global embeddings like GloVe. Experiments demonstrate that our framework significantly improves the stability and success rate of normalized Laplacian spectral clustering with GloVe. Three rectification variants consistently enhance clustering accuracy across multiple benchmark datasets. Moreover, the framework extends the applicability of existing interpretability techniques to modern text embeddings, thereby strengthening both the robustness and interpretability of spectral clustering for textual documents.

Technology Category

Application Category

📝 Abstract
This paper investigates the problem of Graph Spectral Clustering with negative similarities, resulting from document embeddings different from the traditional Term Vector Space (like doc2vec, GloVe, etc.). Solutions for combinatorial Laplacians and normalized Laplacians are discussed. An experimental investigation shows the advantages and disadvantages of 6 different solutions proposed in the literature and in this research. The research demonstrates that GloVe embeddings frequently cause failures of normalized Laplacian based GSC due to negative similarities. Furthermore, application of methods curing similarity negativity leads to accuracy improvement for both combinatorial and normalized Laplacian based GSC. It also leads to applicability for GloVe embeddings of explanation methods developed originally bythe authors for Term Vector Space embeddings.
Problem

Research questions and friction points this paper is trying to address.

Handling negative similarities in graph spectral clustering
Comparing solutions for combinatorial and normalized Laplacians
Improving accuracy for GloVe embeddings in clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Handles negative similarities in spectral clustering
Compares six solutions for Laplacian-based clustering
Improves accuracy with negativity-curing methods
🔎 Similar Papers
No similar papers found.
M
Mieczyslaw A. Klopotek
Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
S
Slawomir T. Wierzcho'n
Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
B
Bartlomiej Starosta
Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
Dariusz Czerski
Dariusz Czerski
Instytut Podstaw Informatyki Polskiej Akademii Nauk
sztuczna inteligencja
P
P. Borkowski
Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland