🤖 AI Summary
Current continuous sign language recognition (CSLR) datasets suffer from limited scale, narrow scene coverage, and strong constraints, hindering robust modeling in real-world settings. To address this, we introduce the first large-scale, multi-scene, unconstrained CSLR dataset: 30,000 videos captured by 18 deaf signers using smartphones, spanning diverse distances, viewing angles, and resolutions, with full gloss-level manual annotations. We establish two novel benchmarks—signer-independent and unseen-sentence CSLR—supporting both gloss-based and gloss-free sign language translation tasks. The dataset organization and evaluation protocol are explicitly designed for continuous temporal modeling. This open-source resource significantly improves model generalization across signers and realistic environments. It enables rigorous evaluation of temporal dynamics and cross-domain robustness, laying a foundational benchmark for advancing robust CSLR and end-to-end sign language translation research.
📝 Abstract
Current benchmarks for sign language recognition (SLR) focus mainly on isolated SLR, while there are limited datasets for continuous SLR (CSLR), which recognizes sequences of signs in a video. Additionally, existing CSLR datasets are collected in controlled settings, which restricts their effectiveness in building robust real-world CSLR systems. To address these limitations, we present Isharah, a large multi-scene dataset for CSLR. It is the first dataset of its type and size that has been collected in an unconstrained environment using signers' smartphone cameras. This setup resulted in high variations of recording settings, camera distances, angles, and resolutions. This variation helps with developing sign language understanding models capable of handling the variability and complexity of real-world scenarios. The dataset consists of 30,000 video clips performed by 18 deaf and professional signers. Additionally, the dataset is linguistically rich as it provides a gloss-level annotation for all dataset's videos, making it useful for developing CSLR and sign language translation (SLT) systems. This paper also introduces multiple sign language understanding benchmarks, including signer-independent and unseen-sentence CSLR, along with gloss-based and gloss-free SLT. The Isharah dataset is available on https://snalyami.github.io/Isharah_CSLR/.