SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of high-quality benchmark datasets hinders progress in Italian Sign Language (LIS) recognition. Method: This work introduces SignIT—the first large-scale, fine-grained, multimodal LIS benchmark—comprising 644 videos (3.33 hours), 94 sign classes spanning five semantic categories, and synchronized 2D keypoint annotations for hands, face, and torso. We establish the first standardized evaluation protocol and systematically benchmark temporal models—including LSTM, Transformer, and multi-stream CNNs—on RGB, skeletal, and multimodal inputs. Contribution/Results: Experiments reveal limited performance of unimodal (RGB or keypoints-only) approaches; however, RGB–skeleton fusion significantly improves accuracy. Nevertheless, state-of-the-art models still exhibit substantial bottlenecks on authentic LIS data. SignIT is publicly released with a standardized evaluation framework and reproducible baselines, enabling rigorous advancement in sign language understanding research.

Technology Category

Application Category

📝 Abstract
In this work we present SignIT, a new dataset to study the task of Italian Sign Language (LIS) recognition. The dataset is composed of 644 videos covering 3.33 hours. We manually annotated videos considering a taxonomy of 94 distinct sign classes belonging to 5 macro-categories: Animals, Food, Colors, Emotions and Family. We also extracted 2D keypoints related to the hands, face and body of the users. With the dataset, we propose a benchmark for the sign recognition task, adopting several state-of-the-art models showing how temporal information, 2D keypoints and RGB frames can be influence the performance of these models. Results show the limitations of these models on this challenging LIS dataset. We release data and annotations at the following link: https://fpv-iplab.github.io/SignIT/.
Problem

Research questions and friction points this paper is trying to address.

Develops a dataset for Italian Sign Language recognition
Proposes a benchmark using multimodal data analysis
Evaluates model performance on challenging sign classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with 644 videos for Italian Sign Language recognition
Manual annotation of 94 sign classes across five macro-categories
Benchmark using temporal data, 2D keypoints, and RGB frames
A
Alessia Micieli
LIVE@IPLab, Department of Mathematics and Computer Science - University of Catania, Italy
Giovanni Maria Farinella
Giovanni Maria Farinella
University of Catania
Computer VisionMachine Learning
F
Francesco Ragusa
LIVE@IPLab, Department of Mathematics and Computer Science - University of Catania, Italy