Enforcing View-Consistency in Class-Agnostic 3D Segmentation Fields

📅 2024-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of category-agnostic 3D scene segmentation in neural radiance fields (NeRFs). We propose an end-to-end learning framework that requires no category labels, avoids post-hoc clustering or hyperparameter tuning, and directly infers 3D segmentation without supervision beyond inconsistent 2D masks. Our method introduces a multi-slot 3D segmentation field, jointly optimized via a spatial consistency regularizer enforcing alignment between slots and multi-view 2D masks, and contrastive learning to ensure cross-view segmentation coherence. To our knowledge, this is the first approach enabling robust and generalizable learning of category-agnostic 3D segmentation fields. It produces high-fidelity 3D panoptic segmentation on real-world complex scenes and yields geometry–semantics-coupled 3D assets directly usable for virtual environment construction.

Technology Category

Application Category

📝 Abstract
Radiance Fields have become a powerful tool for modeling 3D scenes from multiple images. However, they remain difficult to segment into semantically meaningful regions. Some methods work well using 2D semantic masks, but they generalize poorly to class-agnostic segmentations. More recent methods circumvent this issue by using contrastive learning to optimize a high-dimensional 3D feature field instead. However, recovering a segmentation then requires clustering and fine-tuning the associated hyperparameters. In contrast, we aim to identify the necessary changes in segmentation field methods to directly learn a segmentation field while being robust to inconsistent class-agnostic masks, successfully decomposing the scene into a set of objects of any class. By introducing an additional spatial regularization term and restricting the field to a limited number of competing object slots against which masks are matched, a meaningful object representation emerges that best explains the 2D supervision. Our experiments demonstrate the ability of our method to generate 3D panoptic segmentations on complex scenes, and extract high-quality 3D assets from radiance fields that can then be used in virtual 3D environments.
Problem

Research questions and friction points this paper is trying to address.

Enforcing view-consistency in class-agnostic 3D segmentation fields
Directly learning segmentation fields from inconsistent 2D masks
Decomposing 3D scenes into meaningful objects without class constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing spatial regularization for segmentation fields
Limiting object slots for mask matching
Generating 3D panoptic segmentations from radiance fields
🔎 Similar Papers
No similar papers found.