๐ค AI Summary
Existing hierarchical clustering methods suffer from two major limitations: the absence of a globally optimizable objective and insufficient graph-structural modelingโoften relying on static, predefined graphs. To address these, we propose a structural-enhanced continuous hierarchical clustering framework in hyperbolic space. Our method is the first to deeply integrate structural entropy with hyperbolic graph neural networks (HGNNs), introducing a differentiable continuous structural entropy loss for end-to-end optimization. It further incorporates dynamic graph structure learning, jointly optimizing node representations and hierarchical structure. Specifically, nodes are encoded via HGNNs; structural entropy is derived continuously in hyperbolic space using the hyperbolic lowest common ancestor (LCA); and graph topology and hierarchical relationships are co-learned. Evaluated on seven benchmark datasets, our approach significantly outperforms state-of-the-art methods, achieving superior clustering quality and enhanced graph-structural awareness.
๐ Abstract
Hierarchical clustering is a fundamental machine-learning technique for grouping data points into dendrograms. However, existing hierarchical clustering methods encounter two primary challenges: 1) Most methods specify dendrograms without a global objective. 2) Graph-based methods often neglect the significance of graph structure, optimizing objectives on complete or static predefined graphs. In this work, we propose Hyperbolic Continuous Structural Entropy neural networks, namely HypCSE, for structure-enhanced continuous hierarchical clustering. Our key idea is to map data points in the hyperbolic space and minimize the relaxed continuous structural entropy (SE) on structure-enhanced graphs. Specifically, we encode graph vertices in hyperbolic space using hyperbolic graph neural networks and minimize approximate SE defined on graph embeddings. To make the SE objective differentiable for optimization, we reformulate it into a function using the lowest common ancestor (LCA) on trees and then relax it into continuous SE (CSE) by the analogy of hyperbolic graph embeddings and partitioning trees. To ensure a graph structure that effectively captures the hierarchy of data points for CSE calculation, we employ a graph structure learning (GSL) strategy that updates the graph structure during training. Extensive experiments on seven datasets demonstrate the superior performance of HypCSE.