🤖 AI Summary
This work addresses the limitation of existing contrastive learning–based attribute hypergraph clustering methods, which lack explicit clustering supervision and thus may encode irrelevant information into node representations. To overcome this, we propose CAHC, a novel approach that, for the first time, directly incorporates clustering supervision signals into the contrastive learning process. CAHC employs an end-to-end framework to jointly optimize node representations and cluster assignments by integrating dual-granularity contrastive learning at both the node and hyperedge levels, along with a differentiable clustering assignment mechanism. This design enables synergistic optimization between representation learning and clustering. Extensive experiments on eight real-world datasets demonstrate that CAHC significantly outperforms current state-of-the-art baselines, confirming its effectiveness and robustness.
📝 Abstract
Contrastive learning has demonstrated strong performance in attributed hypergraph clustering. Typically, existing methods based on contrastive learning first learn node embeddings and then apply clustering algorithms, such as k-means, to these embeddings to obtain the clustering results.However, these methods lack direct clustering supervision, risking the inclusion of clustering-irrelevant information in the learned graph.To this end, we propose a Contrastive learning approach for Attributed Hypergraph Clustering (CAHC), an end-to-end method that simultaneously learns node embeddings and obtains clustering results. CAHC consists of two main steps: representation learning and cluster assignment learning. The former employs a novel contrastive learning approach that incorporates both node-level and hyperedge-level objectives to generate node embeddings.The latter joint embedding and clustering optimization to refine these embeddings by clustering-oriented guidance and obtains clustering results simultaneously.Extensive experimental results demonstrate that CAHC outperforms baselines on eight datasets.