🤖 AI Summary
This work addresses the needs of the Deaf community by pioneering the positive application of deepfake technology to generate high-quality, linguistically valid sign language videos, while concurrently establishing a generalizable detection benchmark. Methodologically, we propose a pose-driven upper-body video generation framework that integrates generative adversarial networks (GANs) with optical flow and keypoint consistency constraints, and employ a multi-dimensional evaluation combining natural language processing (NLP) metrics—such as sign order and grammatical correctness—with computer vision (CV) metrics. Our contributions are threefold: (1) the first expert-validated sign language deepfake benchmark dataset, comprising over 1,200 videos spanning both known and unseen signers; (2) generated videos achieving a 92% expert validation pass rate; and (3) strong performance on the detection task, attaining an AUC of 0.91 in cross-signer deepfake identification.
📝 Abstract
A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign language expert. This is particularly helpful, given the intricate nature of sign language, a scarcity of sign language experts, and potential benefits for health and education. The objectives of this work encompass constructing a reliable deepfake dataset, evaluating its technical and visual credibility through computer vision and natural language processing models, and assessing the plausibility of the generated content. With over 1200 videos, featuring both previously seen and unseen individuals for the generation model, using the help of a sign language expert, we establish a deepfake dataset in sign language that can further be utilized to detect fake videos that may target certain people of determination.