A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

In generalized category discovery (GCD), models trained on unlabeled data are highly susceptible to irrelevant background interference, leading to attention dispersion and degraded feature discriminability. To address this issue—systematically for the first time in GCD—we propose a lightweight Attention Focusing (AF) mechanism. AF comprises two core components: Multi-scale Token Importance Evaluation (TIME) and Adaptive Token Pruning (TAP), which jointly identify and suppress non-informative background tokens with high precision. The AF module is plug-and-play, fully compatible with mainstream Transformer architectures, and requires no additional annotations or training overhead. When integrated into baselines such as SimGCD, AF boosts classification accuracy on unknown and known categories by up to 15.4%, with negligible computational overhead. This significantly enhances model robustness and generalization in GCD settings.

Technology Category

Application Category

📝 Abstract

Generalized Category Discovery (GCD) aims to classify unlabeled data from both known and unknown categories by leveraging knowledge from labeled known categories. While existing methods have made notable progress, they often overlook a hidden stumbling block in GCD: distracted attention. Specifically, when processing unlabeled data, models tend to focus not only on key objects in the image but also on task-irrelevant background regions, leading to suboptimal feature extraction. To remove this stumbling block, we propose Attention Focusing (AF), an adaptive mechanism designed to sharpen the model's focus by pruning non-informative tokens. AF consists of two simple yet effective components: Token Importance Measurement (TIME) and Token Adaptive Pruning (TAP), working in a cascade. TIME quantifies token importance across multiple scales, while TAP prunes non-informative tokens by utilizing the multi-scale importance scores provided by TIME. AF is a lightweight, plug-and-play module that integrates seamlessly into existing GCD methods with minimal computational overhead. When incorporated into one prominent GCD method, SimGCD, AF achieves up to 15.4% performance improvement over the baseline with minimal computational overhead. The implementation code is provided in https://github.com/Afleve/AFGCD.

Problem

Research questions and friction points this paper is trying to address.

Models focus on irrelevant background in unlabeled data

Distracted attention leads to suboptimal feature extraction

Existing GCD methods overlook attention distraction issue

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Focusing mechanism sharpens model focus

Token Importance Measurement quantifies multi-scale importance

Token Adaptive Pruning removes non-informative tokens

🔎 Similar Papers

No similar papers found.