🤖 AI Summary
This work addresses the challenge of effectively leveraging large-scale unlabeled data containing unknown classes in semi-supervised hierarchical open-set classification. To this end, we propose a novel pseudo-labeling approach based on a teacher–student framework that introduces subtree pseudo-labels to provide structure-aware, reliable supervision signals. Additionally, we design an age-gated mechanism to mitigate overconfident pseudo-label predictions. As the first study to successfully apply semi-supervised learning to hierarchical open-set classification, our method achieves strong performance on the iNaturalist19 benchmark using only 20 labeled samples per class—surpassing self-supervised pretraining followed by fine-tuning and approaching fully supervised performance.
📝 Abstract
Hierarchical open-set classification handles previously unseen classes by assigning them to the most appropriate high-level category in a class taxonomy. We extend this paradigm to the semi-supervised setting, enabling the use of large-scale, uncurated datasets containing a mixture of known and unknown classes to improve the hierarchical open-set performance. To this end, we propose a teacher-student framework based on pseudo-labeling. Two key components are introduced: 1) subtree pseudo-labels, which provide reliable supervision in the presence of unknown data, and 2) age-gating, a mechanism that mitigates overconfidence in pseudo-labels. Experiments show that our framework outperforms self-supervised pretraining followed by supervised adaptation, and even matches the fully supervised counterpart when using only 20 labeled samples per class on the iNaturalist19 benchmark. Our code is available at https://github.com/walline/semihoc.