SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

The lack of high-quality benchmark datasets hinders progress in Italian Sign Language (LIS) recognition. Method: This work introduces SignIT—the first large-scale, fine-grained, multimodal LIS benchmark—comprising 644 videos (3.33 hours), 94 sign classes spanning five semantic categories, and synchronized 2D keypoint annotations for hands, face, and torso. We establish the first standardized evaluation protocol and systematically benchmark temporal models—including LSTM, Transformer, and multi-stream CNNs—on RGB, skeletal, and multimodal inputs. Contribution/Results: Experiments reveal limited performance of unimodal (RGB or keypoints-only) approaches; however, RGB–skeleton fusion significantly improves accuracy. Nevertheless, state-of-the-art models still exhibit substantial bottlenecks on authentic LIS data. SignIT is publicly released with a standardized evaluation framework and reproducible baselines, enabling rigorous advancement in sign language understanding research.

Technology Category

Application Category

📝 Abstract

In this work we present SignIT, a new dataset to study the task of Italian Sign Language (LIS) recognition. The dataset is composed of 644 videos covering 3.33 hours. We manually annotated videos considering a taxonomy of 94 distinct sign classes belonging to 5 macro-categories: Animals, Food, Colors, Emotions and Family. We also extracted 2D keypoints related to the hands, face and body of the users. With the dataset, we propose a benchmark for the sign recognition task, adopting several state-of-the-art models showing how temporal information, 2D keypoints and RGB frames can be influence the performance of these models. Results show the limitations of these models on this challenging LIS dataset. We release data and annotations at the following link: https://fpv-iplab.github.io/SignIT/.

Problem

Research questions and friction points this paper is trying to address.

Develops a dataset for Italian Sign Language recognition

Proposes a benchmark using multimodal data analysis

Evaluates model performance on challenging sign classes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with 644 videos for Italian Sign Language recognition

Manual annotation of 94 sign classes across five macro-categories

Benchmark using temporal data, 2D keypoints, and RGB frames

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

2024-06-11arXiv.orgCitations: 3

Authors to Follow