🤖 AI Summary
This work addresses the challenges of partial annotation—where only “known actions” are labeled while numerous “unknown actions” remain unlabeled—and ambiguous action boundaries in temporal action segmentation. To this end, we formally introduce a novel task termed “action discovery”: jointly identifying, segmenting, and clustering both known and unknown actions under weak supervision. Methodologically, we propose a Granularity-Guided Segmentation Module (GGSM) to capture multi-granularity temporal structures, and an Unknown Action Segment Assignment module (UASA) that jointly leverages temporal modeling and embedding similarity learning, using known actions as semantic anchors to guide the temporal localization and semantic clustering of unknown actions. Extensive experiments on Breakfast, 50Salads, and Desktop Assembly demonstrate significant improvements over existing state-of-the-art methods, validating the approach’s effectiveness and generalizability in realistic, boundary-ambiguous behavioral scenarios.
📝 Abstract
We introduce Action Discovery, a novel setup within Temporal Action Segmentation that addresses the challenge of defining and annotating ambiguous actions and incomplete annotations in partially labeled datasets. In this setup, only a subset of actions - referred to as known actions - is annotated in the training data, while other unknown actions remain unlabeled. This scenario is particularly relevant in domains like neuroscience, where well-defined behaviors (e.g., walking, eating) coexist with subtle or infrequent actions that are often overlooked, as well as in applications where datasets are inherently partially annotated due to ambiguous or missing labels. To address this problem, we propose a two-step approach that leverages the known annotations to guide both the temporal and semantic granularity of unknown action segments. First, we introduce the Granularity-Guided Segmentation Module (GGSM), which identifies temporal intervals for both known and unknown actions by mimicking the granularity of annotated actions. Second, we propose the Unknown Action Segment Assignment (UASA), which identifies semantically meaningful classes within the unknown actions, based on learned embedding similarities. We systematically explore the proposed setting of Action Discovery on three challenging datasets - Breakfast, 50Salads, and Desktop Assembly - demonstrating that our method considerably improves upon existing baselines.