🤖 AI Summary
Unsupervised Object Discovery (UOD) faces two core challenges: insufficient robustness in foreground/background discrimination and under-/over-segmentation due to unknown object counts. To address these, we propose UnionCut and UnionSeg. UnionCut integrates multi-scale self-supervised features with an ensemble min-cut algorithm to construct a robust, learnable foreground prior. UnionSeg introduces a dynamic termination mechanism based on Transformer distillation, enabling adaptive determination of the number of discovered objects. Our framework jointly models foreground detection and instance segmentation in a fully unsupervised manner—requiring neither manual annotations nor fixed iteration counts. Evaluated on three major tasks—single-object discovery, salient object detection, and self-supervised instance segmentation—our method achieves new state-of-the-art performance across standard benchmarks including PASCAL VOC and COCO, with significant improvements in mAP and Adjusted Rand Index (ARI).
📝 Abstract
Unsupervised object discovery (UOD) aims to detect and segment objects in 2D images without handcrafted annotations. Recent progress in self-supervised representation learning has led to some success in UOD algorithms. However, the absence of ground truth provides existing UOD methods with two challenges: 1) determining if a discovered region is foreground or background, and 2) knowing how many objects remain undiscovered. To address these two problems, previous solutions rely on foreground priors to distinguish if the discovered region is foreground, and conduct one or fixed iterations of discovery. However, the existing foreground priors are heuristic and not always robust, and a fixed number of discoveries leads to under or over-segmentation, since the number of objects in images varies. This paper introduces UnionCut, a robust and well-grounded foreground prior based on min-cut and ensemble methods that detects the union of foreground areas of an image, allowing UOD algorithms to identify foreground objects and stop discovery once the majority of the foreground union in the image is segmented. In addition, we propose UnionSeg, a distilled transformer of UnionCut that outputs the foreground union more efficiently and accurately. Our experiments show that by combining with UnionCut or UnionSeg, previous state-of-the-art UOD methods witness an increase in the performance of single object discovery, saliency detection and self-supervised instance segmentation on various benchmarks. The code is available at https://github.com/YFaris/UnionCut.