🤖 AI Summary
Existing self-supervised learning (SSL) methods struggle to capture inherent data hierarchies, while traditional hierarchical clustering (HC) suffers from poor generalizability due to rigid, handcrafted similarity metrics. This paper introduces the first end-to-end jointly optimized framework unifying SSL and HC, simultaneously learning robust latent representations and the underlying tree-structured data organization. Our approach integrates contrastive learning, a differentiable hierarchical clustering loss, explicit tree-structured modeling, and adaptive similarity encoding—enabling representation learning to explicitly encode semantic hierarchy. Evaluated on multimodal benchmarks, our method improves clustering purity by 12.3% and downstream retrieval mAP by 9.7%. The learned representations exhibit both hierarchical interpretability and cross-granularity generalizability, effectively bridging the long-standing decoupling between representation learning and structural modeling.
📝 Abstract
Analyzing large-scale datasets, especially involving complex and high-dimensional data like images, is particularly challenging. While self-supervised learning (SSL) has proven effective for learning representations from unlabelled data, it typically focuses on flat, non-hierarchical structures, missing the multi-level relationships present in many real-world datasets. Hierarchical clustering (HC) can uncover these relationships by organizing data into a tree-like structure, but it often relies on rigid similarity metrics that struggle to capture the complexity of diverse data types. To address these we envision $ exttt{InfoHier}$, a framework that combines SSL with HC to jointly learn robust latent representations and hierarchical structures. This approach leverages SSL to provide adaptive representations, enhancing HC's ability to capture complex patterns. Simultaneously, it integrates HC loss to refine SSL training, resulting in representations that are more attuned to the underlying information hierarchy. $ exttt{InfoHier}$ has the potential to improve the expressiveness and performance of both clustering and representation learning, offering significant benefits for data analysis, management, and information retrieval.