🤖 AI Summary
This paper introduces Causal Abductive Reasoning for Video Events (CARVE), a novel task that identifies root-cause triggering events leading to a target event in video sequences and generates interpretable causal chains. To support CARVE, we construct the first dual-source benchmark dataset comprising both synthetic and real-world videos, and propose a counterfactual synthesis-based method for automatic causal label generation. We further introduce the Causal Event Relation Network (CERN), which jointly models temporal dynamics, multi-granularity event representations, and causal relation discrimination. Experiments demonstrate that our approach significantly improves trigger-event identification accuracy over baselines. Results validate that effective representation and interaction modeling of event relations are critical for video-based causal reasoning. This work establishes a new paradigm for applications including intelligent video surveillance and failure root-cause analysis in complex systems.
📝 Abstract
This paper introduces a new problem, Causal Abductive Reasoning on Video Events (CARVE), which involves identifying causal relationships between events in a video and generating hypotheses about causal chains that account for the occurrence of a target event. To facilitate research in this direction, we create two new benchmark datasets with both synthetic and realistic videos, accompanied by trigger-target labels generated through a novel counterfactual synthesis approach. To explore the challenge of solving CARVE, we present a Causal Event Relation Network (CERN) that examines the relationships between video events in temporal and semantic spaces to efficiently determine the root-cause trigger events. Through extensive experiments, we demonstrate the critical roles of event relational representation learning and interaction modeling in solving video causal reasoning challenges. The introduction of the CARVE task, along with the accompanying datasets and the CERN framework, will advance future research on video causal reasoning and significantly facilitate various applications, including video surveillance, root-cause analysis and movie content management.