Attention Beyond Neighborhoods: Reviving Transformer for Graph Clustering

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised graph clustering faces a fundamental trade-off: GNNs suffer from representation homogenization due to excessive local aggregation, while Transformers overlook local structural patterns due to global attention modeling. Method: We propose AGCN—a novel paradigm that natively embeds attention mechanisms into the graph topology, realizing “graph-as-attention.” Specifically: (1) adjacency relations are reformulated as learnable attention weights, enabling joint optimization of graph structure and attention; (2) a key-value (KV) cache mechanism is introduced to improve training efficiency; (3) a pairwise margin contrastive loss is designed to enhance discriminability in the attention space. Contribution/Results: Theoretical analysis demonstrates AGCN’s dual capability—local sensitivity and global information capture. Extensive experiments on multiple benchmark datasets show that AGCN consistently outperforms state-of-the-art methods, validating its effectiveness, generalizability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Attention mechanisms have become a cornerstone in modern neural networks, driving breakthroughs across diverse domains. However, their application to graph structured data, where capturing topological connections is essential, remains underexplored and underperforming compared to Graph Neural Networks (GNNs), particularly in the graph clustering task. GNN tends to overemphasize neighborhood aggregation, leading to a homogenization of node representations. Conversely, Transformer tends to over globalize, highlighting distant nodes at the expense of meaningful local patterns. This dichotomy raises a key question: Is attention inherently redundant for unsupervised graph learning? To address this, we conduct a comprehensive empirical analysis, uncovering the complementary weaknesses of GNN and Transformer in graph clustering. Motivated by these insights, we propose the Attentive Graph Clustering Network (AGCN) a novel architecture that reinterprets the notion that graph is attention. AGCN directly embeds the attention mechanism into the graph structure, enabling effective global information extraction while maintaining sensitivity to local topological cues. Our framework incorporates theoretical analysis to contrast AGCN behavior with GNN and Transformer and introduces two innovations: (1) a KV cache mechanism to improve computational efficiency, and (2) a pairwise margin contrastive loss to boost the discriminative capacity of the attention space. Extensive experimental results demonstrate that AGCN outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Addressing attention mechanism limitations in graph clustering tasks
Overcoming neighborhood overemphasis in GNNs and over-globalization in Transformers
Developing effective graph clustering with balanced local and global attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding attention into graph structure
KV cache mechanism for efficiency
Pairwise margin contrastive loss enhancement
🔎 Similar Papers
Xuanting Xie
Xuanting Xie
University of Electronic Science and Technology of China
Graph Neural NetworksClustering
B
Bingheng Li
Michigan State University
E
Erlin Pan
Alibaba Group
Rui Hou
Rui Hou
Member of Technical Staff, xAI
Large Language ModelReasoning
Wenyu Chen
Wenyu Chen
Massachusetts Institute of Technology
optimizationstatistical learning
Z
Zhao Kang
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China