Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing 3D talking head generation methods, which rely on heuristic facial motion priors and often produce inaccurate reconstructions and distorted animations. To overcome these issues, we propose the first end-to-end framework that integrates Gaussian Splatting into this task, enabling the synthesis of photorealistic talking videos from a single portrait image and an audio clip without requiring 3D supervision or facial landmark annotations. Our approach jointly optimizes 2D reconstruction loss and score distillation loss to simultaneously model a static 3D head representation and audio-driven dynamic lip motions, while automatically disentangling the foreground head from the 2D background. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both talking head generation and novel view synthesis, yielding videos with superior visual fidelity.

Technology Category

Application Category

📝 Abstract

Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image. Previous 3D talking head generation methods have relied on domain-specific heuristics such as warping-based facial motion representation priors to animate talking motions, yet still produce inaccurate 3D avatar reconstructions, thus undermining the realism of generated animations. We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis. Our approach automatically learns to disentangle a single portrait image into a static 3D reconstruction represented as static Gaussian Splatting, and a predicted whole-image 2D background. It then generates natural lip motion conditioned on input audio, without any motion driven priors. Training is driven purely by 2D reconstruction and score-distillation losses, without 3D supervision nor landmarks. Experimental results demonstrate that Splat-Portrait exhibits superior performance on talking head generation and novel view synthesis, achieving better visual quality compared to previous works. Our project code and supplementary documents are public available at https://github.com/stonewalking/Splat-portrait.

Problem

Research questions and friction points this paper is trying to address.

Talking Head Generation

3D Head Reconstruction

Lip Motion Synthesis

Gaussian Splatting

Portrait Animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting

Talking Head Generation

3D Reconstruction

Audio-Driven Animation

Score Distillation

🔎 Similar Papers

No similar papers found.

Authors to Follow