AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

📅 2023-10-11
🏛️ IEEE transactions on multimedia
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing speech-driven 3D facial animation methods struggle to model speaker-specific speaking styles, resulting in expressions and head poses lacking vitality and personalization. To address this, we propose a novel framework that enables personalized speech-driven animation from merely ~10 seconds of reference video. First, we introduce MoLoRA—a low-rank mixture-of-adapters mechanism—that efficiently decouples and learns identity-specific expression styles. Second, we design a fine-tuning-free, semantics-aware pose style retrieval module, integrating discrete pose priors with semantically aligned style embeddings to achieve natural and controllable head motion synthesis. By synergistically combining LoRA, Mixture-of-Experts (MoE) architecture, and discrete pose representation, our method achieves state-of-the-art performance across expressiveness, style fidelity, and audio-visual synchronization—demonstrating superior objective metrics and human perceptual evaluation.
📝 Abstract
Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to the lack of vividness. In this work, we propose AdaMesh, a novel adaptive speech-driven facial animation approach, which learns the personalized talking style from a reference video of about 10 seconds and generates vivid facial expressions and head poses. Specifically, we propose mixture-of-low-rank adaptation (MoLoRA) to fine-tune the expression adapter, which efficiently captures the facial expression style. For the personalized pose style, we propose a pose adapter by building a discrete pose prior and retrieving the appropriate style embedding with a semantic-aware pose style matrix without fine-tuning. Extensive experimental results show that our approach outperforms state-of-the-art methods, preserves the talking style in the reference video, and generates vivid facial animation. The supplementary video and code will be available at https://adamesh.github.io.
Problem

Research questions and friction points this paper is trying to address.

Generates personalized 3D facial animations from speech
Captures individual facial expression and head pose styles
Uses minimal reference video for adaptive animation generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

AdaMesh adapts speech-driven 3D facial animation.
MoLoRA fine-tunes expression adapter for vividness.
Pose adapter uses semantic-aware matrix without fine-tuning.
🔎 Similar Papers
No similar papers found.
Liyang Chen
Liyang Chen
Tsinghua University
Multimodal Video GenerationSpeech Synthesis
Weihong Bao
Weihong Bao
Shenzhen International Graduate School, Tsinghua University
S
Shunwei Lei
Shenzhen International Graduate School, Tsinghua University
B
Boshi Tang
Shenzhen International Graduate School, Tsinghua University
Z
Zhiyong Wu
Shenzhen International Graduate School, Tsinghua University
Shiyin Kang
Shiyin Kang
Skywork AI PTE. LTD.
Haozhi Huang
Haozhi Huang
XVerse Inc.
H
Helen M. Meng
The Chinese University of Hong Kong