Using an LLM to Turn Sign Spottings into Spoken Language Sentences

📅 2024-03-15

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses sign language translation (SLT), proposing a two-stage “detection + generation” hybrid framework, Spotter+GPT. First, a Transformer-based sign spotter detects isolated sign units (spotting) from video with high precision. Second, the resulting symbolic sign sequence is fed into fine-tuned or prompt-engineered GPT-series large language models (LLMs) to generate grammatically correct and contextually coherent spoken-language sentences. Crucially, this is the first approach to decouple sign spotting from linguistic generation: the spotter is trained on linguistically annotated spotting data, while the LLM handles high-level syntactic and semantic modeling—enhancing both interpretability and generalization. On the PHOENIX14-T benchmark, Spotter+GPT achieves state-of-the-art performance, surpassing leading end-to-end SLT methods in both BLEU and METEOR scores, thereby validating the effectiveness and advancement of this paradigm.

Technology Category

Application Category

📝 Abstract

Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a hybrid SLT approach, Spotter+GPT, that utilizes a sign spotter and a powerful Large Language Model (LLM) to improve SLT performance. Spotter+GPT breaks down the SLT task into two stages. The videos are first processed by the Spotter, which is trained on a linguistic sign language dataset, to identify individual signs. These spotted signs are then passed to an LLM, which transforms them into coherent and contextually appropriate spoken language sentences. The source code of the Spotter is available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotter.

Problem

Research questions and friction points this paper is trying to address.

Generating spoken sentences from sign language videos

Avoiding heavy end-to-end training in SLT

Reducing computational costs in sign language translation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular SLT framework with LLMs

Two-stage sign spotting and translation

No SLT-specific training required

🔎 Similar Papers

An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs