v-CLR: View-Consistent Learning for Open-World Instance Segmentation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

In open-world instance segmentation, models often over-rely on texture and other appearance cues, hindering generalization to novel classes with unseen appearances. To address this, we propose v-CLR, a cross-view consistency learning framework. v-CLR generates structurally preserved yet texture-augmented multi-view images and leverages unsupervised object proposals (e.g., DINO and SAM) for class-agnostic, object-level cross-view feature matching. It further enforces cross-view feature consistency and employs contrastive representation learning to encourage appearance-invariant, structure-sensitive instance representations. To our knowledge, v-CLR is the first unsupervised cross-view consistency learning framework specifically designed for open-world instance segmentation. It achieves state-of-the-art performance on both cross-category and cross-dataset open-world segmentation benchmarks, significantly improving detection recall and segmentation accuracy for instances of novel classes with previously unseen textures.

Technology Category

Application Category

📝 Abstract

In this paper, we address the challenging problem of open-world instance segmentation. Existing works have shown that vanilla visual networks are biased toward learning appearance information, eg texture, to recognize objects. This implicit bias causes the model to fail in detecting novel objects with unseen textures in the open-world setting. To address this challenge, we propose a learning framework, called view-Consistent LeaRning (v-CLR), which aims to enforce the model to learn appearance-invariant representations for robust instance segmentation. In v-CLR, we first introduce additional views for each image, where the texture undergoes significant alterations while preserving the image's underlying structure. We then encourage the model to learn the appearance-invariant representation by enforcing the consistency between object features across different views, for which we obtain class-agnostic object proposals using off-the-shelf unsupervised models that possess strong object-awareness. These proposals enable cross-view object feature matching, greatly reducing the appearance dependency while enhancing the object-awareness. We thoroughly evaluate our method on public benchmarks under both cross-class and cross-dataset settings, achieving state-of-the-art performance. Project page: https://visual-ai.github.io/vclr

Problem

Research questions and friction points this paper is trying to address.

Open-world instance segmentation with novel objects

Reducing texture bias in visual networks

Learning appearance-invariant object representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

View-consistent learning for appearance-invariant segmentation

Texture-altered views enhance object structure learning

Unsupervised object proposals for cross-view feature matching

🔎 Similar Papers

No similar papers found.