v-CLR: View-Consistent Learning for Open-World Instance Segmentation

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In open-world instance segmentation, models often over-rely on texture and other appearance cues, hindering generalization to novel classes with unseen appearances. To address this, we propose v-CLR, a cross-view consistency learning framework. v-CLR generates structurally preserved yet texture-augmented multi-view images and leverages unsupervised object proposals (e.g., DINO and SAM) for class-agnostic, object-level cross-view feature matching. It further enforces cross-view feature consistency and employs contrastive representation learning to encourage appearance-invariant, structure-sensitive instance representations. To our knowledge, v-CLR is the first unsupervised cross-view consistency learning framework specifically designed for open-world instance segmentation. It achieves state-of-the-art performance on both cross-category and cross-dataset open-world segmentation benchmarks, significantly improving detection recall and segmentation accuracy for instances of novel classes with previously unseen textures.

Technology Category

Application Category

📝 Abstract
In this paper, we address the challenging problem of open-world instance segmentation. Existing works have shown that vanilla visual networks are biased toward learning appearance information, eg texture, to recognize objects. This implicit bias causes the model to fail in detecting novel objects with unseen textures in the open-world setting. To address this challenge, we propose a learning framework, called view-Consistent LeaRning (v-CLR), which aims to enforce the model to learn appearance-invariant representations for robust instance segmentation. In v-CLR, we first introduce additional views for each image, where the texture undergoes significant alterations while preserving the image's underlying structure. We then encourage the model to learn the appearance-invariant representation by enforcing the consistency between object features across different views, for which we obtain class-agnostic object proposals using off-the-shelf unsupervised models that possess strong object-awareness. These proposals enable cross-view object feature matching, greatly reducing the appearance dependency while enhancing the object-awareness. We thoroughly evaluate our method on public benchmarks under both cross-class and cross-dataset settings, achieving state-of-the-art performance. Project page: https://visual-ai.github.io/vclr
Problem

Research questions and friction points this paper is trying to address.

Open-world instance segmentation with novel objects
Reducing texture bias in visual networks
Learning appearance-invariant object representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

View-consistent learning for appearance-invariant segmentation
Texture-altered views enhance object structure learning
Unsupervised object proposals for cross-view feature matching
🔎 Similar Papers
No similar papers found.