🤖 AI Summary
To address the challenges of strong cultural specificity and scarce annotated data in Bangla Sign Language (BdSL) recognition, this paper proposes a lightweight pose-driven Transformer framework. Methodologically: (i) a culturally adaptive gesture preprocessing module is introduced to capture region-specific BdSL characteristics; (ii) an optimized learnable positional encoding scheme is designed to enhance spatiotemporal modeling of skeletal keypoints; and (iii) a curriculum learning strategy is integrated to improve generalization and convergence speed under low-data regimes. The architecture employs only four compact Transformer encoder layers, significantly reducing parameter count and FLOPs. Evaluated on the BdSLW60 benchmark, the model achieves 97.92% Top-1 accuracy while accelerating inference by 32% over prior approaches. This work delivers an efficient, deployable solution for low-resource sign language recognition, balancing accuracy, computational efficiency, and cultural adaptability.
📝 Abstract
We introduce BdSL-SPOTER, a pose-based transformer framework for accurate and efficient recognition of Bengali Sign Language (BdSL). BdSL-SPOTER extends the SPOTER paradigm with cultural specific preprocessing and a compact four-layer transformer encoder featuring optimized learnable positional encodings, while employing curriculum learning to enhance generalization on limited data and accelerate convergence. On the BdSLW60 benchmark, it achieves 97.92% Top-1 validation accuracy, representing a 22.82% improvement over the Bi-LSTM baseline, all while keeping computational costs low. With its reduced number of parameters, lower FLOPs, and higher FPS, BdSL-SPOTER provides a practical framework for real-world accessibility applications and serves as a scalable model for other low-resource regional sign languages.