StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video-centric continual learning (VCIL) confronts dual challenges: complex spatiotemporal modeling and catastrophic forgetting. Existing approaches either rely on sample replay—introducing privacy risks and memory overhead—or neglect temporal dynamics entirely. This paper proposes a sample-free, task-agnostic end-to-end VCIL framework. First, Frame-Sharing Semantic Distillation (FSSD) preserves cross-task spatial semantic consistency without storing raw videos. Second, Temporal-Decomposed Mixture of Experts (TD-MoE), coupled with a dynamic expert routing mechanism, explicitly models action temporal dynamics and enables task-adaptive inference. Third, channel-level selective regularization enhances parameter stability. Evaluated on UCF101, HMDB51, and Kinetics400, our method significantly outperforms state-of-the-art approaches, achieving superior accuracy, minimal memory footprint, and enhanced interpretability.

Technology Category

Application Category

📝 Abstract
Video Class-Incremental Learning (VCIL) seeks to develop models that continuously learn new action categories over time without forgetting previously acquired knowledge. Unlike traditional Class-Incremental Learning (CIL), VCIL introduces the added complexity of spatiotemporal structures, making it particularly challenging to mitigate catastrophic forgetting while effectively capturing both frame-shared semantics and temporal dynamics. Existing approaches either rely on exemplar rehearsal, raising concerns over memory and privacy, or adapt static image-based methods that neglect temporal modeling. To address these limitations, we propose Spatiotemporal Preservation and Routing (StPR), a unified and exemplar-free VCIL framework that explicitly disentangles and preserves spatiotemporal information. First, we introduce Frame-Shared Semantics Distillation (FSSD), which identifies semantically stable and meaningful channels by jointly considering semantic sensitivity and classification contribution. These important semantic channels are selectively regularized to maintain prior knowledge while allowing for adaptation. Second, we design a Temporal Decomposition-based Mixture-of-Experts (TD-MoE), which dynamically routes task-specific experts based on their temporal dynamics, enabling inference without task ID or stored exemplars. Together, StPR effectively leverages spatial semantics and temporal dynamics, achieving a unified, exemplar-free VCIL framework. Extensive experiments on UCF101, HMDB51, and Kinetics400 show that our method outperforms existing baselines while offering improved interpretability and efficiency in VCIL. Code is available in the supplementary materials.
Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in video class-incremental learning
Disentangles spatiotemporal information without exemplar rehearsal
Dynamically routes task-specific experts for temporal modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frame-Shared Semantics Distillation preserves stable semantic channels
Temporal Decomposition-based Mixture-of-Experts routes task-specific experts
Unified exemplar-free VCIL framework disentangles spatiotemporal information
🔎 Similar Papers
No similar papers found.
H
Huaijie Wang
Xidian University, Xi’an, China
De Cheng
De Cheng
Associate Professor, Xidian University
Computer VisionDeep LearningMachine LearningData Compression
G
Guozhang Li
Beijing Normal University, Beijing, China
Zhipeng Xu
Zhipeng Xu
Northeastern University
NLPInformation Retrieval
L
Lingfeng He
Xidian University, Xi’an, China
J
Jie Li
Xidian University, Xi’an, China
N
Nan Wang
Xidian University, Xi’an, China
X
Xinbo Gao
Xidian University, Xi’an, China