Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

📅 2024-07-02

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

119K/year

🤖 AI Summary

Existing multi-class few-shot semantic segmentation methods suffer from weak generalization, limited prompt modality support (e.g., only points or boxes), and strong dependence on fixed N-way K-shot configurations. Method: We propose the first unified end-to-end framework for multi-class few-shot segmentation, built upon a vision-language promptable Transformer architecture that accepts arbitrary combinations of point, box, and mask prompts. It introduces multi-prompt fusion encoding and cross-class contrastive learning to construct a class-agnostic, unified support-set representation. Contribution/Results: Our key innovation is the complete decoupling of the model from both the number of support classes (N) and the number of shots per class (K), enabling truly generalizable few-shot segmentation. Evaluated on COCO-20i, our method achieves state-of-the-art performance while significantly reducing training overhead and maintaining robustness across all settings—from 1-way 1-shot to arbitrary N-way K-shot configurations.

Technology Category

Application Category

📝 Abstract

We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class. Diverging from traditional FSS methods that predominantly rely on masks for annotating support images, Label Anything introduces varied visual prompts -- points, bounding boxes, and masks -- thereby enhancing the framework's versatility and adaptability. Unique to our approach, Label Anything is engineered for end-to-end training across multi-class FSS scenarios, efficiently learning from diverse support set configurations without retraining. This approach enables a"universal"application to various FSS challenges, ranging from $1$-way $1$-shot to complex $N$-way $K$-shot configurations while remaining agnostic to the specific number of class examples. This innovative training strategy reduces computational requirements and substantially improves the model's adaptability and generalization across diverse segmentation tasks. Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20^i$ benchmark, underscores Label Anything's robust generalization and flexibility. The source code is publicly available at: https://github.com/pasqualedem/LabelAnything.

Problem

Research questions and friction points this paper is trying to address.

Enables multi-class few-shot segmentation with minimal examples

Introduces versatile visual prompts for enhanced adaptability

Achieves state-of-the-art generalization without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses varied visual prompts for segmentation

Enables end-to-end multi-class few-shot training

Achieves state-of-the-art generalization flexibility

🔎 Similar Papers

No similar papers found.