🤖 AI Summary
To address the challenge of unsupervised discovery of unknown classes in open-world image classification, this paper proposes a fully unsupervised clustering framework. First, discriminative embeddings are extracted using vision transformers; then, manifold learning is employed to refine the intrinsic geometric structure of these embeddings, enabling nonlinear calibration and enhancement of the embedding space. Crucially, the method operates without any labels, prior knowledge of the number of classes, or exemplars from known categories—making it applicable to scenarios with arbitrary numbers of unknown classes. Evaluated on CIFAR-10/100, ImageNet-100, and Tiny ImageNet, the approach achieves state-of-the-art performance in both single-modality clustering and novel class discovery tasks, significantly improving clustering accuracy and generalization under open-world conditions. To the best of our knowledge, this is the first work to realize fully unsupervised novel class discovery under the standard open-world setting.
📝 Abstract
Working with annotated data is the cornerstone of supervised learning. Nevertheless, providing labels to instances is a task that requires significant human effort. Several critical real-world applications make things more complicated because no matter how many labels may have been identified in a task of interest, it could be the case that examples corresponding to novel classes may appear in the future. Not unsurprisingly, prior work in this, so-called, `open-world' context has focused a lot on semi-supervised approaches.
Focusing on image classification, somehow paradoxically, we propose a fully unsupervised approach to the problem of determining the novel categories in a particular dataset. Our approach relies on estimating the number of clusters using Vision Transformers, which utilize attention mechanisms to generate vector embeddings. Furthermore, we incorporate manifold learning techniques to refine these embeddings by exploiting the intrinsic geometry of the data, thereby enhancing the overall image clustering performance. Overall, we establish new State-of-the-Art results on single-modal clustering and Novel Class Discovery on CIFAR-10, CIFAR-100, ImageNet-100, and Tiny ImageNet. We do so, both when the number of clusters is known or unknown ahead of time. The code is available at: https://github.com/DROWCULA/DROWCULA.