🤖 AI Summary
This work addresses the challenge of generating multi-view consistent and high-fidelity 3D faces from a single unconstrained image, a task where existing methods often suffer from geometric inconsistency or identity distortion on out-of-distribution data. The authors propose SplatShot, a training-free framework that, for the first time, explicitly couples 3D Gaussian splatting with pretrained 2D diffusion priors within the diffusion denoising process. By introducing a 3D feedback loop at each denoising step, the method jointly optimizes multi-view images and 3D geometry through photometric error backpropagation, achieving identity-faithful, geometrically consistent, and photorealistic 3D face reconstruction. Experiments demonstrate that SplatShot significantly outperforms state-of-the-art approaches on in-the-wild images, excelling in identity preservation, realism, and cross-view consistency.
📝 Abstract
Reconstructing a photorealistic 3D face avatar from a single unconstrained photograph is challenging: feed-forward 3D Gaussian Splatting (3DGS) models degrade on out-of-distribution inputs, while pretrained diffusion models produce high-fidelity images but lack multi-view consistency. We observe that these paradigms are fundamentally complementary: explicit 3D representations guarantee geometric consistency, whereas 2D diffusion priors ensure photorealism. Building on this, we propose SplatShot, a training-free framework that couples these representations directly within the denoising process. Given a base 3DGS face model and a single reference image, we jointly denoise all target views using a per-step 3D feedback loop. At each timestep, we predict clean images from the noisy latents, refit the 3DGS to these multi-view predictions, and back-propagate the photometric discrepancy between the 3DGS re-renderings and 2D predictions into the noise estimate. This steers the sampling trajectory toward strictly 3D-coherent, identity-faithful outputs. Experiments on diverse in-the-wild images demonstrate that SplatShot produces 3D avatars with superior identity preservation, photorealism, and multi-view consistency.