Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Isolated Sign Language Recognition (ISLR) suffers from geometric ambiguity arising from strong coupling between hand shape and motion trajectories, making it challenging to discriminate semantically distinct but morphologically similar signs. To address this, we propose the Dual-Stream Spatio-Temporal Graph Convolutional Network (DST-GCN), the first ISLR framework to employ a dual-reference coordinate system—centered at the wrist and face—to decouple shape and motion modeling. One stream performs viewpoint-invariant static shape analysis via topology-aware graph convolution and a Finsler geometry encoder; the other models context-aware dynamic trajectories. We further introduce a geometry-driven optimal transport fusion mechanism that adaptively integrates dual-stream features. DST-GCN achieves state-of-the-art accuracy of 93.70%, 89.97%, and 99.79% on WLASL-100, WLASL-300, and LSA64, respectively—outperforming prior methods while using fewer parameters.

Technology Category

Application Category

📝 Abstract
Isolated Sign Language Recognition (ISLR) is challenged by gestures that are morphologically similar yet semantically distinct, a problem rooted in the complex interplay between hand shape and motion trajectory. Existing methods, often relying on a single reference frame, struggle to resolve this geometric ambiguity. This paper introduces Dual-SignLanguageNet (DSLNet), a dual-reference, dual-stream architecture that decouples and models gesture morphology and trajectory in separate, complementary coordinate systems. Our approach utilizes a wrist-centric frame for view-invariant shape analysis and a facial-centric frame for context-aware trajectory modeling. These streams are processed by specialized networks-a topology-aware graph convolution for shape and a Finsler geometry-based encoder for trajectory-and are integrated via a geometry-driven optimal transport fusion mechanism. DSLNet sets a new state-of-the-art, achieving 93.70%, 89.97% and 99.79% accuracy on the challenging WLASL-100, WLASL-300 and LSA64 datasets, respectively, with significantly fewer parameters than competing models.
Problem

Research questions and friction points this paper is trying to address.

Resolving geometric ambiguity in similar sign gestures
Decoupling gesture morphology and motion trajectory modeling
Achieving view-invariant and context-aware sign recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream architecture decouples gesture morphology and trajectory
Uses wrist-centric and facial-centric frames for complementary analysis
Integrates streams via geometry-driven optimal transport fusion
🔎 Similar Papers
No similar papers found.
L
Liangjin Liu
School of Computer Science, Sichuan University
Haoyang Zheng
Haoyang Zheng
Purdue University
Artificial IntelligenceMachine LearningDeep LearningGenerative Models
P
Pei Zhou
School of Computer Science, Sichuan University