POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sign language translation faces severe low-resource challenges due to the scarcity of large-scale, sentence-level aligned data. To address this, we propose a linguistics-inspired pose concatenation pretraining framework that enables end-to-end translation without requiring morpheme-level annotations. First, synthetic sentence pairs are generated using linguistic templates to provide strong supervised signals. Second, a pose sequence concatenation strategy is introduced to explicitly model temporal dependencies across gestures during pretraining. Our method employs a Transformer encoder-decoder architecture, jointly leveraging template-driven synthetic supervision and concatenation-augmented training. Evaluated on How2Sign and iSign, our approach achieves BLEU-4 scores of 4.56 (+2.59) and 3.43 (+2.88), respectively—substantially outperforming prior state-of-the-art methods. This work establishes a scalable, annotation-free paradigm for low-resource sign language translation.

Technology Category

Application Category

📝 Abstract
Sign language translation remains a challenging task due to the scarcity of large-scale, sentence-aligned datasets. Prior arts have focused on various feature extraction and architectural changes to support neural machine translation for sign languages. We propose POSESTITCH-SLT, a novel pre-training scheme that is inspired by linguistic-templates-based sentence generation technique. With translation comparison on two sign language datasets, How2Sign and iSign, we show that a simple transformer-based encoder-decoder architecture outperforms the prior art when considering template-generated sentence pairs in training. We achieve BLEU-4 score improvements from 1.97 to 4.56 on How2Sign and from 0.55 to 3.43 on iSign, surpassing prior state-of-the-art methods for pose-based gloss-free translation. The results demonstrate the effectiveness of template-driven synthetic supervision in low-resource sign language settings.
Problem

Research questions and friction points this paper is trying to address.

Addressing sign language translation scarcity with pose-stitching pre-training
Improving gloss-free translation using linguistic template generation
Enhancing low-resource SLT through template-driven synthetic supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel pre-training scheme using linguistic templates
Template-generated sentence pairs enhance training
Transformer encoder-decoder with synthetic supervision
🔎 Similar Papers
No similar papers found.
A
Abhinav Joshi
Department of Computer Science and Engineering, Indian Institute of Technology Kanpur (IIT Kanpur)
V
Vaibhav Sharma
Department of Computer Science and Engineering, Indian Institute of Technology Kanpur (IIT Kanpur)
Sanjeet Singh
Sanjeet Singh
Department of Computer Science and Engineering, Indian Institute of Technology Kanpur (IIT Kanpur)
Ashutosh Modi
Ashutosh Modi
Indian Institute of Technology Kanpur
Natural Language ProcessingMachine and Deep LearningArtificial IntelligenceAffective ComputingLegal AI