🤖 AI Summary
To address the curse of dimensionality and insufficient representation learning in high-dimensional clustering, this paper systematically surveys and restructures autoencoder-based deep clustering methods. We propose a “sandbox-style” unified framework that emphasizes the domain-agnostic nature and task-customizability of autoencoders, jointly optimizing reconstruction learning with clustering objectives—such as k-means coupling or co-designed objective functions—to achieve consistent learning of low-dimensional nonlinear representations and intrinsic cluster structure. Through comparative analysis of over a dozen state-of-the-art algorithms, we clarify their design motivations, modeling assumptions, and applicability boundaries, establishing the first structured methodology for deep clustering. This work provides a scalable paradigm foundation and practical benchmark, significantly improving clustering accuracy, robustness, and interpretability.
📝 Abstract
Autoencoders offer a general way of learning low-dimensional, non-linear representations from data without labels. This is achieved without making any particular assumptions about the data type or other domain knowledge. The generality and domain agnosticism in combination with their simplicity make autoencoders a perfect sandbox for researching and developing novel (deep) clustering algorithms. Clustering methods group data based on similarity, a task that benefits from the lower-dimensional representation learned by an autoencoder, mitigating the curse of dimensionality. Specifically, the combination of deep learning with clustering, called Deep Clustering, enables to learn a representation tailored to specific clustering tasks, leading to high-quality results. This survey provides an introduction to fundamental autoencoder-based deep clustering algorithms that serve as building blocks for many modern approaches.