Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing 3D talking head generation methods, which rely on heuristic facial motion priors and often produce inaccurate reconstructions and distorted animations. To overcome these issues, we propose the first end-to-end framework that integrates Gaussian Splatting into this task, enabling the synthesis of photorealistic talking videos from a single portrait image and an audio clip without requiring 3D supervision or facial landmark annotations. Our approach jointly optimizes 2D reconstruction loss and score distillation loss to simultaneously model a static 3D head representation and audio-driven dynamic lip motions, while automatically disentangling the foreground head from the 2D background. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both talking head generation and novel view synthesis, yielding videos with superior visual fidelity.

Technology Category

Application Category

📝 Abstract
Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image. Previous 3D talking head generation methods have relied on domain-specific heuristics such as warping-based facial motion representation priors to animate talking motions, yet still produce inaccurate 3D avatar reconstructions, thus undermining the realism of generated animations. We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis. Our approach automatically learns to disentangle a single portrait image into a static 3D reconstruction represented as static Gaussian Splatting, and a predicted whole-image 2D background. It then generates natural lip motion conditioned on input audio, without any motion driven priors. Training is driven purely by 2D reconstruction and score-distillation losses, without 3D supervision nor landmarks. Experimental results demonstrate that Splat-Portrait exhibits superior performance on talking head generation and novel view synthesis, achieving better visual quality compared to previous works. Our project code and supplementary documents are public available at https://github.com/stonewalking/Splat-portrait.
Problem

Research questions and friction points this paper is trying to address.

Talking Head Generation
3D Head Reconstruction
Lip Motion Synthesis
Gaussian Splatting
Portrait Animation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
Talking Head Generation
3D Reconstruction
Audio-Driven Animation
Score Distillation
🔎 Similar Papers
No similar papers found.
T
Tong Shi
School of Computing Science, University of Glasgow
M
Melonie de Almeida
School of Computing Science, University of Glasgow
D
D. Ivanova
School of Computing Science, University of Glasgow
Nicolas Pugeault
Nicolas Pugeault
Reader, School of Computing Science, University of Glasgow
Computer VisionMachine LearningCognitive Robotics
Paul Henderson
Paul Henderson
University of Glasgow
computer visionmachine learning