What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Procedural activity understanding requires modeling bidirectional causal relationships between action steps and scene states, yet existing video representation methods lack explicit modeling of state transitions. To address this, we propose a novel framework integrating state-change supervision with counterfactual reasoning. First, we leverage large language models (LLMs) to automatically generate fine-grained state-change descriptions as weak supervision signals. Second, we construct counterfactual variants of state changes to strengthen “if–then” causal reasoning. Finally, we jointly train a video encoder with downstream tasks—temporal action segmentation and error detection—to unify modeling of both normal and anomalous procedural behaviors. This work is the first to incorporate LLM-generated state descriptions and their counterfactuals into procedural video representation learning. Experiments on multiple benchmarks demonstrate substantial performance gains, validating the efficacy of explicit state modeling and counterfactual augmentation for causal understanding.

Technology Category

Application Category

📝 Abstract
Understanding a procedural activity requires modeling both how action steps transform the scene, and how evolving scene transformations can influence the sequence of action steps, even those that are accidental or erroneous. Existing work has studied procedure-aware video representations by proposing novel approaches such as modeling the temporal order of actions and has not explicitly learned the state changes (scene transformations). In this work, we study procedure-aware video representation learning by incorporating state-change descriptions generated by Large Language Models (LLMs) as supervision signals for video encoders. Moreover, we generate state-change counterfactuals that simulate hypothesized failure outcomes, allowing models to learn by imagining the unseen ``What if'' scenarios. This counterfactual reasoning facilitates the model's ability to understand the cause and effect of each step in an activity. To verify the procedure awareness of our model, we conduct extensive experiments on procedure-aware tasks, including temporal action segmentation and error detection. Our results demonstrate the effectiveness of the proposed state-change descriptions and their counterfactuals and achieve significant improvements on multiple tasks. We will make our source code and data publicly available soon.
Problem

Research questions and friction points this paper is trying to address.

Modeling scene transformations in procedural activities
Learning state changes using LLM-generated descriptions
Enhancing understanding via counterfactual reasoning on failures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM-generated state-change descriptions as supervision
Generates state-change counterfactuals for unseen scenarios
Enhances cause-effect understanding in procedural activities
🔎 Similar Papers
No similar papers found.