EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing triplane-based methods for talking head synthesis, which suffer from resolution constraints and projection approximation errors. The authors propose a novel, triplane-free approach that efficiently models speech-driven facial dynamics by introducing learnable embeddings to drive temporal deformations of 3D Gaussian points. By eliminating the conventional triplane encoding architecture, the method achieves significantly improved model compactness and inference efficiency. Experimental results demonstrate that the proposed technique outperforms current 3D Gaussian Splatting (3DGS) approaches in rendering quality, lip-sync accuracy, and motion consistency, while achieving real-time performance with over 60 FPS on an RTX 2060 mobile GPU.

Technology Category

Application Category

📝 Abstract
Real-time talking head synthesis increasingly relies on deformable 3D Gaussian Splatting (3DGS) due to its low latency. Tri-planes are the standard choice for encoding Gaussians prior to deformation, since they provide a continuous domain with explicit spatial relationships. However, tri-plane representations are limited by grid resolution and approximation errors introduced by projecting 3D volumetric fields onto 2D subspaces. Recent work has shown the superiority of learnt embeddings for driving temporal deformations in 4D scene reconstruction. We introduce $\textbf{EmbedTalk}$, which shows how such embeddings can be leveraged for modelling speech deformations in talking head synthesis. Through comprehensive experiments, we show that EmbedTalk outperforms existing 3DGS-based methods in rendering quality, lip synchronisation, and motion consistency, while remaining competitive with state-of-the-art generative models. Moreover, replacing the tri-plane encoding with learnt embeddings enables significantly more compact models that achieve over 60 FPS on a mobile GPU (RTX 2060 6 GB). Our code will be placed in the public domain on acceptance.
Problem

Research questions and friction points this paper is trying to address.

talking head synthesis
3D Gaussian Splatting
tri-plane representation
embedding-driven deformation
real-time rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding-Driven Deformation
Triplane-Free
3D Gaussian Splatting
Talking Head Synthesis
Real-time Rendering
🔎 Similar Papers
No similar papers found.
Arpita Saggar
Arpita Saggar
University of Leeds
J
Jonathan C. Darling
Leeds Institute of Medical Education, School of Medicine, University of Leeds
Duygu Sarikaya
Duygu Sarikaya
University of Leeds, School of Computer Science
computer assisted surgerymedical image computingcomputer vision
D
David C. Hogg
School of Computer Science, University of Leeds