🤖 AI Summary
Existing video forensic methods struggle to effectively detect and localize sparsely distributed AI-generated manipulations in unedited long videos and often lack interpretability. This work introduces, for the first time, the task of temporal AI-generated segment localization and explanation, presenting TASLE—the first large-scale benchmark dataset annotated with temporal boundaries, authenticity labels, and segment-level rationales. To address this task, the authors propose MSLoc, a method that combines boundary-sensitive candidate segment generation with a multimodal large language model–driven refinement and explanation module, enabling coarse-to-fine interpretable localization. Experiments demonstrate that MSLoc significantly improves localization accuracy on TASLE and validate the critical role of segment-level explainable analysis in detecting AI-generated content in long videos.
📝 Abstract
The rapid advancement of AI-driven video generation has transformed content creation, while simultaneously increasing the risk of misinformation through localized manipulations in long-form videos. Existing video forensic methods predominantly operate on short, independent clips, and thus fail to capture realistic scenarios where AI-generated content is sparsely embedded within otherwise authentic footage. To bridge this gap, we formulate the task of Temporal AI-Generated Segment Localization and Explanation, which targets authenticity detection, temporal localization, and interpretable analysis of manipulated segments in untrimmed long videos. We further introduce TASLE, a large-scale benchmark comprising 12,472 untrimmed videos with diverse manipulation patterns and rich annotation signals, including temporal boundaries, authenticity labels, and segment-level rationales. In addition, we propose MSLoc, a coarse-to-fine forensic baseline that combines a boundary-sensitive proposal generation module for efficient long-video scanning with an MLLM-based refinement module for precise boundary localization and interpretable reasoning. Experiments validate the effectiveness of the proposed baseline, highlighting the importance of segment-level explainable forensics for long-form AI-generated video analysis. Our dataset and code are publicly available at https://debby-0527.github.io/TASLE.