SignSplat: Rendering Sign Language via Gaussian Splatting

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of high-fidelity, few-view rendering of sign language videos, with particular emphasis on modeling fine-grained hand and facial dynamics. It overcomes expressiveness limitations of conventional Gaussian rasterization in strongly temporal human motion. The proposed method introduces a mesh-constrained Gaussian fitting framework that integrates parametric regularization, surface-adaptive Gaussian densification and pruning, and jointly incorporates temporal modeling with neural machine translation–driven sign language concatenation. Evaluated on standard sign language benchmarks, the approach achieves state-of-the-art performance, significantly improving geometric fidelity and motion coherence—especially for high-degree-of-freedom, highly dynamic sign articulations. By enabling accurate reconstruction from sparse views, it establishes a scalable, high-precision paradigm for low-resource sign language modeling.

Technology Category

Application Category

📝 Abstract

State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fidelity models are compounded by the complexity of capturing multi-view data of sign. The solution is to make better use of sequence data, ensuring that we can overcome the limited information from only a few views by exploiting temporal variability. Nevertheless, learning from sequence-level data requires extremely accurate and consistent model fitting to ensure that appearance is consistent across complex motions. We focus on how to achieve this, constraining mesh parameters to build an accurate Gaussian splatting framework from few views capable of modelling subtle human motion. We leverage regularization techniques on the Gaussian parameters to mitigate overfitting and rendering artifacts. Additionally, we propose a new adaptive control method to densify Gaussians and prune splat points on the mesh surface. To demonstrate the accuracy of our approach, we render novel sequences of sign language video, building on neural machine translation approaches to sign stitching. On benchmark datasets, our approach achieves state-of-the-art performance; and on highly articulated and complex sign language motion, we significantly outperform competing approaches.

Problem

Research questions and friction points this paper is trying to address.

Rendering high-fidelity sign language with subtle hand and face motions

Overcoming limited multi-view data by leveraging temporal sequence information

Preventing overfitting and artifacts in Gaussian splatting for complex motions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian splatting for sign language rendering

Leverages sequence data to overcome few-view limitations

Applies adaptive control for Gaussian densification and pruning

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

2024-06-11arXiv.orgCitations: 3

Authors to Follow