A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations

πŸ“… 2025-03-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses communication barriers faced by the Deaf and hard-of-hearing community in Greece by introducing the first end-to-end bidirectional Transformer framework for Greek Sign Language Production (SLP), enabling mutual translation between Greek spoken-language text and sign pose sequences. Methodologically: (1) we propose a data-driven morphemic sign representation, augmented with extended skeletal motion encoding to enhance pose modeling; (2) we design a video–text joint pretraining strategy coupled with a hybrid decoding schedule integrating teacher-forcing and autoregressive inference. Evaluated on the Elementary23 dataset, our approach achieves state-of-the-art (SOTA) performance in both motion naturalness and lexical accuracy of generated sign videos. Ablation studies confirm the efficacy of each component. This work establishes the first end-to-end benchmark for Greek Sign Language generation and provides a reusable technical paradigm for low-resource sign language production.

Technology Category

Application Category

πŸ“ Abstract
Sign Languages are the primary form of communication for Deaf communities across the world. To break the communication barriers between the Deaf and Hard-of-Hearing and the hearing communities, it is imperative to build systems capable of translating the spoken language into sign language and vice versa. Building on insights from previous research, we propose a deep learning model for Sign Language Production (SLP), which to our knowledge is the first attempt on Greek SLP. We tackle this task by utilizing a transformer-based architecture that enables the translation from text input to human pose keypoints, and the opposite. We evaluate the effectiveness of the proposed pipeline on the Greek SL dataset Elementary23, through a series of comparative analyses and ablation studies. Our pipeline's components, which include data-driven gloss generation, training through video to text translation and a scheduling algorithm for teacher forcing - auto-regressive decoding seem to actively enhance the quality of produced SL videos.
Problem

Research questions and friction points this paper is trying to address.

Develops a transformer-based model for Greek Sign Language Production.
Translates text to sign language and vice versa using human pose keypoints.
Evaluates effectiveness on Greek SL dataset with comparative analyses.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based architecture for Greek Sign Language
Text to human pose keypoints translation
Data-driven gloss generation and teacher forcing
πŸ”Ž Similar Papers
No similar papers found.