🤖 AI Summary
This study addresses automatic Indian Sign Language (ISL) recognition to facilitate natural communication between deaf and hearing individuals. Focusing on 30 fundamental gestures, we propose a recognition method leveraging 3D skeletal sequences—captured via Kinect with 20 joint locations—and incorporating synchronized RGB and depth modalities. Our core contribution is a geometric transformation–driven deep frame alignment strategy, which effectively mitigates temporal modeling errors induced by gesture positional shifts and hand rotations, thereby significantly improving the spatial robustness of recurrent neural networks (RNNs). Experimental evaluation on a standard ISL benchmark dataset achieves an 84.81% classification accuracy, outperforming existing RNN-based baselines that operate directly on raw skeleton sequences. The proposed spatiotemporal alignment paradigm offers a generalizable solution for low-resource sign language recognition tasks.
📝 Abstract
Sign language is a gesture-based symbolic communication medium among speech and hearing impaired people. It also serves as a communication bridge between non-impaired and impaired populations. Unfortunately, in most situations, a non-impaired person is not well conversant in such symbolic languages restricting the natural information flow between these two categories. Therefore, an automated translation mechanism that seamlessly translates sign language into natural language can be highly advantageous. In this paper, we attempt to perform recognition of 30 basic Indian sign gestures. Gestures are represented as temporal sequences of 3D maps (RGB + depth), each consisting of 3D coordinates of 20 body joints captured by the Kinect sensor. A recurrent neural network (RNN) is employed as the classifier. To improve the classifier's performance, we use geometric transformation for the alignment correction of depth frames. In our experiments, the model achieves 84.81% accuracy.