Hands-On: Segmenting Individual Signs from Continuous Sequences

πŸ“… 2025-04-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the critical challenge of precise gesture boundary segmentation in continuous sign language videosβ€”a fundamental prerequisite for sign language translation and annotation. We propose an end-to-end segmentation framework based on BIO sequence labeling. To our knowledge, this is the first application of the Transformer architecture to sign language segmentation. We further introduce a novel multimodal representation that jointly encodes hand appearance features extracted by HaMeR and 3D joint angles, thereby significantly enhancing temporal modeling capability. Evaluated on the DGS corpus, our method achieves state-of-the-art performance; it also consistently outperforms all existing baselines on the BSL Corpus. These results empirically validate that the fusion of appearance- and pose-based modalities yields substantial improvements in segmentation accuracy.

Technology Category

Application Category

πŸ“ Abstract
This work tackles the challenge of continuous sign language segmentation, a key task with huge implications for sign language translation and data annotation. We propose a transformer-based architecture that models the temporal dynamics of signing and frames segmentation as a sequence labeling problem using the Begin-In-Out (BIO) tagging scheme. Our method leverages the HaMeR hand features, and is complemented with 3D Angles. Extensive experiments show that our model achieves state-of-the-art results on the DGS Corpus, while our features surpass prior benchmarks on BSLCorpus.
Problem

Research questions and friction points this paper is trying to address.

Segmenting individual signs from continuous sign sequences
Improving sign language translation and data annotation
Proposing a transformer-based model for temporal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based architecture for temporal dynamics
BIO tagging scheme for sequence labeling
HaMeR hand features with 3D Angles