Explainable Forensics of Manipulated Segments in Untrimmed Long Videos

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
Existing video forensic methods struggle to effectively detect and localize sparsely distributed AI-generated manipulations in unedited long videos and often lack interpretability. This work introduces, for the first time, the task of temporal AI-generated segment localization and explanation, presenting TASLE—the first large-scale benchmark dataset annotated with temporal boundaries, authenticity labels, and segment-level rationales. To address this task, the authors propose MSLoc, a method that combines boundary-sensitive candidate segment generation with a multimodal large language model–driven refinement and explanation module, enabling coarse-to-fine interpretable localization. Experiments demonstrate that MSLoc significantly improves localization accuracy on TASLE and validate the critical role of segment-level explainable analysis in detecting AI-generated content in long videos.
📝 Abstract
The rapid advancement of AI-driven video generation has transformed content creation, while simultaneously increasing the risk of misinformation through localized manipulations in long-form videos. Existing video forensic methods predominantly operate on short, independent clips, and thus fail to capture realistic scenarios where AI-generated content is sparsely embedded within otherwise authentic footage. To bridge this gap, we formulate the task of Temporal AI-Generated Segment Localization and Explanation, which targets authenticity detection, temporal localization, and interpretable analysis of manipulated segments in untrimmed long videos. We further introduce TASLE, a large-scale benchmark comprising 12,472 untrimmed videos with diverse manipulation patterns and rich annotation signals, including temporal boundaries, authenticity labels, and segment-level rationales. In addition, we propose MSLoc, a coarse-to-fine forensic baseline that combines a boundary-sensitive proposal generation module for efficient long-video scanning with an MLLM-based refinement module for precise boundary localization and interpretable reasoning. Experiments validate the effectiveness of the proposed baseline, highlighting the importance of segment-level explainable forensics for long-form AI-generated video analysis. Our dataset and code are publicly available at https://debby-0527.github.io/TASLE.
Problem

Research questions and friction points this paper is trying to address.

video forensics
AI-generated content
temporal localization
explainable AI
long-form video
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Localization
Explainable Forensics
Long-form Video
AI-generated Content Detection
Multimodal Large Language Model
🔎 Similar Papers
No similar papers found.
Y
Yue Feng
MoE Key Laboratory of Brain-Machine Intelligence Technology, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics
Jingjing Li
Jingjing Li
University of Electronic Science and Technology of China
Domain AdaptationZero-shot LearningMultimedia UnderstandingTransfer Learning
Q
Qijia Lu
MoE Key Laboratory of Brain-Machine Intelligence Technology, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics
W
Wei Ji
Independent Scholar
J
Jingrou Zhang
MoE Key Laboratory of Brain-Machine Intelligence Technology, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics
Fei Shen
Fei Shen
National University of Singapore
Controllable GenerationMultimodal Safety
Xiao Li
Xiao Li
Nanjing University
Natural language processingReasoning
Y
Yizhen Jia
MoE Key Laboratory of Brain-Machine Intelligence Technology, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics
Q
Qiang Chen
Independent Scholar
Limin Wang
Limin Wang
Nanjing University
Computer VisionAction RecognitionVideo Understanding
Wentong Li
Wentong Li
Nanjing University of Aeronautics and Astronautics
Computer VisionMachine LearningVision-Language ModelRobotics
Jie Qin
Jie Qin
Professor, Nanjing University of Aeronautics and Astronautics
Computer VisionMachine LearningPattern Recognition