🤖 AI Summary
Existing open-world novel class discovery methods suffer from noise and voids in explicit 3D segmentation maps and rely heavily on dense supervision. To address this, we propose Embedding-NeRF—the first framework to integrate implicit neural radiance fields (NeRF) into novel class discovery. Our approach jointly models semantic information and uncertainty within a visual embedding space: semantic entropy is quantified via KL divergence, and uncertainty is aggregated through feature query-modulation and collaborative clustering, enabling fully unsupervised novel class identification. Crucially, Embedding-NeRF requires no human annotations—neither dense nor sparse—and delivers high-fidelity segmentation under both open- and closed-world settings. Evaluated on NYUv2 and Replica, it significantly outperforms state-of-the-art methods while simultaneously achieving zero-shot novel class discovery and maintaining high accuracy for known classes.
📝 Abstract
Discovering novel classes in open-world settings is crucial for real-world applications. Traditional explicit representations, such as object descriptors or 3D segmentation maps, are constrained by their discrete, hole-prone, and noisy nature, which hinders accurate novel class discovery. To address these challenges, we introduce NeurNCD, the first versatile and data-efficient framework for novel class discovery that employs the meticulously designed Embedding-NeRF model combined with KL divergence as a substitute for traditional explicit 3D segmentation maps to aggregate semantic embedding and entropy in visual embedding space. NeurNCD also integrates several key components, including feature query, feature modulation and clustering, facilitating efficient feature augmentation and information exchange between the pre-trained semantic segmentation network and implicit neural representations. As a result, our framework achieves superior segmentation performance in both open and closed-world settings without relying on densely labelled datasets for supervised training or human interaction to generate sparse label supervision. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on the NYUv2 and Replica datasets.