POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Sign language translation faces severe low-resource challenges due to the scarcity of large-scale, sentence-level aligned data. To address this, we propose a linguistics-inspired pose concatenation pretraining framework that enables end-to-end translation without requiring morpheme-level annotations. First, synthetic sentence pairs are generated using linguistic templates to provide strong supervised signals. Second, a pose sequence concatenation strategy is introduced to explicitly model temporal dependencies across gestures during pretraining. Our method employs a Transformer encoder-decoder architecture, jointly leveraging template-driven synthetic supervision and concatenation-augmented training. Evaluated on How2Sign and iSign, our approach achieves BLEU-4 scores of 4.56 (+2.59) and 3.43 (+2.88), respectively—substantially outperforming prior state-of-the-art methods. This work establishes a scalable, annotation-free paradigm for low-resource sign language translation.

Technology Category

Application Category

📝 Abstract

Sign language translation remains a challenging task due to the scarcity of large-scale, sentence-aligned datasets. Prior arts have focused on various feature extraction and architectural changes to support neural machine translation for sign languages. We propose POSESTITCH-SLT, a novel pre-training scheme that is inspired by linguistic-templates-based sentence generation technique. With translation comparison on two sign language datasets, How2Sign and iSign, we show that a simple transformer-based encoder-decoder architecture outperforms the prior art when considering template-generated sentence pairs in training. We achieve BLEU-4 score improvements from 1.97 to 4.56 on How2Sign and from 0.55 to 3.43 on iSign, surpassing prior state-of-the-art methods for pose-based gloss-free translation. The results demonstrate the effectiveness of template-driven synthetic supervision in low-resource sign language settings.

Problem

Research questions and friction points this paper is trying to address.

Addressing sign language translation scarcity with pose-stitching pre-training

Improving gloss-free translation using linguistic template generation

Enhancing low-resource SLT through template-driven synthetic supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel pre-training scheme using linguistic templates

Template-generated sentence pairs enhance training

Transformer encoder-decoder with synthetic supervision

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale