Generalized Category Discovery via Token Manifold Capacity Learning

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Generalized Category Discovery (GCD) in open-world settings suffers from weak clustering robustness and loss of representation diversity when known and unknown categories coexist. To address this, we propose the Maximal Token Manifold Capacity (MTMC) criterion—the first to quantify manifold capacity via nuclear norm—jointly optimizing intra-class richness and inter-class separability, thereby overcoming the limitation of conventional methods that solely minimize intra-cluster variance. Our approach builds upon a self-supervised contrastive learning framework, incorporating singular-value-based nuclear norm regularization, dynamic class-token aggregation, and capacity-aware prototype updating. Evaluated on both coarse- and fine-grained benchmarks, our method improves clustering accuracy by 3.2% and category number estimation F1-score by 5.8%. It effectively mitigates dimensional collapse while enhancing representation completeness and discriminability.

Technology Category

Application Category

📝 Abstract

Generalized category discovery (GCD) is essential for improving deep learning models' robustness in open-world scenarios by clustering unlabeled data containing both known and novel categories. Traditional GCD methods focus on minimizing intra-cluster variations, often sacrificing manifold capacity, which limits the richness of intra-class representations. In this paper, we propose a novel approach, Maximum Token Manifold Capacity (MTMC), that prioritizes maximizing the manifold capacity of class tokens to preserve the diversity and complexity of data. MTMC leverages the nuclear norm of singular values as a measure of manifold capacity, ensuring that the representation of samples remains informative and well-structured. This method enhances the discriminability of clusters, allowing the model to capture detailed semantic features and avoid the loss of critical information during clustering. Through theoretical analysis and extensive experiments on coarse- and fine-grained datasets, we demonstrate that MTMC outperforms existing GCD methods, improving both clustering accuracy and the estimation of category numbers. The integration of MTMC leads to more complete representations, better inter-class separability, and a reduction in dimensional collapse, establishing MTMC as a vital component for robust open-world learning. Code is in github.com/lytang63/MTMC.

Problem

Research questions and friction points this paper is trying to address.

Enhancing deep learning robustness in open-world scenarios

Preserving intra-class diversity via manifold capacity maximization

Improving clustering accuracy and category number estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximizes token manifold capacity for diversity

Uses nuclear norm to measure manifold capacity

Enhances discriminability and avoids information loss

🔎 Similar Papers

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery