Multi Activity Sequence Alignment via Implicit Clustering

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing self-supervised time-series alignment methods are limited to single-activity scenarios, requiring activity-specific modeling and thus failing to generalize to multi-activity or cross-modal video alignment. To address this, we propose the first unified framework that jointly optimizes implicit segment-level clustering and frame-level alignment. Our approach leverages dual-path data augmentation and multimodal feature fusion to achieve self-supervised alignment across activities and modalities. Crucially, we co-optimize temporal alignment with unsupervised segment clustering, enhancing both discriminability and generalizability of learned representations. Evaluated on three major benchmarks—H2O, PennAction, and IKEA ASM—our method consistently outperforms state-of-the-art approaches. It achieves significant gains in multi-activity alignment accuracy and cross-task transferability, demonstrating superior robustness and scalability in complex, real-world settings.

Technology Category

Application Category

📝 Abstract

Self-supervised temporal sequence alignment can provide rich and effective representations for a wide range of applications. However, existing methods for achieving optimal performance are mostly limited to aligning sequences of the same activity only and require separate models to be trained for each activity. We propose a novel framework that overcomes these limitations using sequence alignment via implicit clustering. Specifically, our key idea is to perform implicit clip-level clustering while aligning frames in sequences. This coupled with our proposed dual augmentation technique enhances the network's ability to learn generalizable and discriminative representations. Our experiments show that our proposed method outperforms state-of-the-art results and highlight the generalization capability of our framework with multi activity and different modalities on three diverse datasets, H2O, PennAction, and IKEA ASM. We will release our code upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Aligns multi-activity sequences using implicit clustering

Enhances generalizable and discriminative representations

Outperforms state-of-the-art on diverse datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit clip-level clustering for sequence alignment

Dual augmentation enhances generalizable representations

Outperforms state-of-the-art on diverse datasets

🔎 Similar Papers

Information Fusion in Multimodal IoT Systems for physical activity level monitoring