🤖 AI Summary
Sign Language Recognition (SLR) is fundamentally constrained by the scarcity of labeled training data. To address this, we propose SSLR, the first systematic semi-supervised framework for isolated SLR that integrates pseudo-labeling into the task. SSLR takes human pose keypoints as input and employs a Transformer-based backbone, augmented with consistency regularization and a dynamic-threshold pseudo-label generation mechanism to iteratively expand the training set. By significantly reducing reliance on labeled data, SSLR achieves superior performance: on WLASL-100, it surpasses fully supervised baselines using only 20% of labeled data; across 10%–50% labeling ratios, it yields an average accuracy improvement of 3.2%; and it demonstrates strong robustness in few-shot settings. This work establishes a scalable, high-performance semi-supervised paradigm for low-resource SLR, offering a practical pathway toward deploying accurate sign language recognition systems under limited annotation budgets.
📝 Abstract
Sign language is the primary communication language for people with disabling hearing loss. Sign language recognition (SLR) systems aim to recognize sign gestures and translate them into spoken language. One of the main challenges in SLR is the scarcity of annotated datasets. To address this issue, we propose a semi-supervised learning (SSL) approach for SLR (SSLR), employing a pseudo-label method to annotate unlabeled samples. The sign gestures are represented using pose information that encodes the signer's skeletal joint points. This information is used as input for the Transformer backbone model utilized in the proposed approach. To demonstrate the learning capabilities of SSL across various labeled data sizes, several experiments were conducted using different percentages of labeled data with varying numbers of classes. The performance of the SSL approach was compared with a fully supervised learning-based model on the WLASL-100 dataset. The obtained results of the SSL model outperformed the supervised learning-based model with less labeled data in many cases.