🤖 AI Summary
This work addresses the incompatibility of existing speech-driven 3D facial animation methods with industrial production pipelines, which hinders their deployment in mainstream game engines. We propose the first ARKit-compatible system that can be directly deployed in Unreal Engine, leveraging a pipeline that converts the MEAD dataset into blendshape sequences using MediaPipe, followed by retraining of FaceDiffuser and ProbTalk3D-X models. Additionally, we develop a modular Unreal Engine plugin with a Python backend that supports emotion control and stochastic variation. User studies demonstrate that our system achieves animation quality on par with commercial solutions such as Epic MetaHuman and NVIDIA Audio2Face. To facilitate the transition from academic research to production-grade applications, we release both the processed dataset and the plugin architecture as open-source resources.
📝 Abstract
Speech-driven 3D facial animation research has shown promising results, but most methods rely on representations that are not compatible with production pipelines. In this work, we present a deployable system that bridges this gap by enabling speech-driven 3D facial animation directly in Unreal Engine (UE) using ARKit-compatible representations. We construct 3DMEAD-ARKit dataset by converting the MEAD corpus into blendshape sequences using MediaPipe, and retrain FaceDiffuser and ProbTalk3D-X to generate stochastic and emotion controllable animations. We further develop a modular UE plugin with a Python backend that supports model selection, and parameter control. We compare the results to two existing commercial tools: Epic Games' MetaHuman speech-driven animator and Nvidia Audio2Face with a perceptual user study. The results highlight the importance of comparisons among academic and commercial pipelines. We recommend watching the supplementary video. We also plan to do live demonstrations of our work at Siggraph 2026 conference.