MLT-Dedup: Efficient Large-Scale Online Video Deduplication via Multi-Level Representations and Spatial-Temporal Matching

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of degraded user experience and increased storage and bandwidth costs caused by massive near-duplicate videos on online platforms, where existing methods struggle to balance recall and efficiency under limited indexing budgets. The authors propose MLT-Dedup, a novel framework that integrates multi-granularity video encoding (ML-VE) with a difference-aware spatiotemporal similarity module (DiF-SiM). It efficiently retrieves candidates using sparse segment-level embeddings and refines matching at the frame level to achieve high-precision deduplication. Evaluated on a real-world large-scale platform, the method reduces online duplication rates by 91% at 90% precision while increasing indexing capacity fivefold, substantially expanding the coverage of duplicate detection.
📝 Abstract
The explosive growth of user-generated video content on online platforms is accompanied by the emergence of numerous near-duplicate videos--videos that are identical or highly similar but differ by partial edits. These duplicates degrade user experience and increase storage and bandwidth costs, making large-scale video deduplication a critical task. Existing video deduplication frameworks face a fundamental challenge in retrieving sufficient high-quality candidates under a limited index budget, as well as trade-offs between efficiency and precision. To address these issues, we propose MLT-Dedup, an efficient large-scale online video deduplication framework with Multi-Level representations and spatial-Temporal matching. Our approach employs a Multi-Level Video Encoder (ML-VE) to extract both fine-grained frame-level and sparse clip-level embeddings: sparse embeddings support efficient candidate retrieval, while fine-grained embeddings are loaded for precise pairwise matching. During matching, we introduce DiF-SiM, a Differential Feature-enhanced Similarity Module capable of locating duplicated temporal segments and providing reliable similarity evidence to support policy-driven deduplication decisions. Extensive experiments on a real-world large-scale platform demonstrate that MLT-Dedup reduces online repetition rates by 91% at 90% precision. Furthermore, our sparse retrieval design achieves a 5x increase in indexing capacity, enabling broader candidate coverage in real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

video deduplication
near-duplicate videos
large-scale
online platforms
index budget
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Level Representations
Spatial-Temporal Matching
Video Deduplication
Sparse Retrieval
Differential Feature-enhanced Similarity
🔎 Similar Papers
No similar papers found.
D
David Yuchen Wang
TikTok
H
Haoying Li
TikTok
H
Hailun Xu
TikTok
W
Wei Chee Yew
TikTok
Z
Zirui Zhu
School of Computing, National University of Singapore
Sanjay Saha
Sanjay Saha
PhD in Computer Science, National University of Singapore
Computer VisionBiometricsMachine Learning
H
Hao Hei
TikTok
K
Kanchan Sarkar
TikTok
K
Kun Xu
TikTok