Semantic-Guided Unsupervised Video Summarization

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a semantic-guided unsupervised video summarization method to address the limitations of existing approaches, which often rely on single-modality features, lack semantic guidance, and suffer from unstable GAN training. The proposed framework employs a frame-level semantic alignment attention mechanism to guide a Transformer-based generator in reconstructing keyframes, effectively integrating high-level semantic information to enhance keyframe selection quality. Furthermore, a progressive training strategy is introduced to stabilize GAN optimization. Extensive experiments on multiple benchmark datasets demonstrate that the proposed method outperforms current unsupervised video summarization techniques in terms of summary quality and fidelity.

Technology Category

Application Category

📝 Abstract
Video summarization is a crucial technique for social understanding, enabling efficient browsing of massive multimedia content and extraction of key information from social platforms. Most existing unsupervised summarization methods rely on Generative Adversarial Networks (GANs) to enhance keyframe selection and generate coherent, video summaries through adversarial training. However, such approaches primarily exploit unimodal features, overlooking the guiding role of semantic information in keyframe selection, and often suffer from unstable training. To address these limitations, we propose a novel Semantic-Guided Unsupervised Video Summarization method. Specifically, we design a novel frame-level semantic alignment attention mechanism and integrate it into a keyframe selector, which guides the Transformer-based generator within the adversarial framework to better reconstruct videos. In addition, we adopt an incremental training strategy to progressively update the model components, effectively mitigating the instability of GAN training. Experimental results demonstrate that our approach achieves superior performance on multiple benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

video summarization
unsupervised learning
semantic guidance
keyframe selection
GAN instability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Guided
Unsupervised Video Summarization
Frame-level Semantic Alignment
Incremental Training
Transformer-based Generator
🔎 Similar Papers
No similar papers found.
H
Haizhou Liu
Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
H
Haodong Jin
Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
Yiming Wang
Yiming Wang
School of Chemical Engineering, East China University of Science and Technology
lifelike soft materialsnon-equilibrium materialssupramolecular self-assembly
Hui Yu
Hui Yu
Professor of Visual and Cognitive Computing, University of Glasgow
Visual ComputingCognitive ComputingSocial RobotParallel Intelligence