Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unreliable novel-class identification in Generalized Category Discovery (GCD) due to limited semantic information in static images. We thus introduce Video-GCD—a novel task extending GCD to the video domain for the first time. To effectively model temporal multi-view information, we propose a Memory-guided and Consistency-aware Contrastive Learning (MACL) framework comprising: (1) a dual-level memory bank (feature- and logit-level) to support cross-frame consistency estimation; (2) a consistency-guided voting mechanism enabling closed-loop optimization between representation learning and consistency modeling; and (3) a weighted contrastive loss enhancing intra-class compactness and inter-class separability. Extensive experiments on action recognition and bird classification video benchmarks demonstrate that our method significantly outperforms image-based GCD baselines, validating the critical role of temporal modeling in video category discovery.

Technology Category

Application Category

📝 Abstract
Generalized Category Discovery (GCD) is an emerging and challenging open-world problem that has garnered increasing attention in recent years. Most existing GCD methods focus on discovering categories in static images. However, relying solely on static visual content is often insufficient to reliably discover novel categories. To bridge this gap, we extend the GCD problem to the video domain and introduce a new setting, termed Video-GCD. Thus, effectively integrating multi-perspective information across time is crucial for accurate Video-GCD. To tackle this challenge, we propose a novel Memory-guided Consistency-aware Contrastive Learning (MCCL) framework, which explicitly captures temporal-spatial cues and incorporates them into contrastive learning through a consistency-guided voting mechanism. MCCL consists of two core components: Consistency-Aware Contrastive Learning(CACL) and Memory-Guided Representation Enhancement (MGRE). CACL exploits multiperspective temporal features to estimate consistency scores between unlabeled instances, which are then used to weight the contrastive loss accordingly. MGRE introduces a dual-level memory buffer that maintains both feature-level and logit-level representations, providing global context to enhance intra-class compactness and inter-class separability. This in turn refines the consistency estimation in CACL, forming a mutually reinforcing feedback loop between representation learning and consistency modeling. To facilitate a comprehensive evaluation, we construct a new and challenging Video-GCD benchmark, which includes action recognition and bird classification video datasets. Extensive experiments demonstrate that our method significantly outperforms competitive GCD approaches adapted from image-based settings, highlighting the importance of temporal information for discovering novel categories in videos. The code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Extending Generalized Category Discovery to video domain
Integrating temporal-spatial cues for novel category discovery
Addressing insufficiency of static visual content in videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-guided consistency-aware contrastive learning framework
Consistency-aware contrastive learning with temporal feature weighting
Dual-level memory buffer for feature and logit enhancement
🔎 Similar Papers
No similar papers found.
Zhang Jing
Zhang Jing
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
P
Pu Nan
Department of Information Engineering and Computer Science, University of Trento, Italy
X
Xie Yu Xiang
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
G
Guo Yanming
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
L
Lu Qianqi
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
Z
Zou Shiwei
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
Y
Yan Jie
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
Chen Yan
Chen Yan
Associate Professor, Zhejiang University, College of EE
CPS SecurityEmbedded System SecuritySensor Security