🤖 AI Summary
Existing 3D Gaussian avatar modeling approaches rely heavily on extensive multi-view expression sequences, resulting in low data efficiency and limited scalability. This work proposes SAGE, a novel framework that, for the first time, enables self-supervised learning of expressive Gaussian deformations without requiring long training sequences—achieving high-quality, animatable avatars from as little as a single monocular frame or even a one-time input. The method jointly optimizes 2D Gaussian surfels and a signed distance field (SDF), leveraging geometric and appearance consistency constraints to facilitate self-supervised expression learning. By drastically reducing data requirements while matching the reconstruction and animation quality of state-of-the-art methods, SAGE significantly enhances the accessibility and efficiency of avatar creation.
📝 Abstract
Modeling dynamic facial expressions using 3D Gaussian representations remains challenging due to their unstructured nature. Conventional Gaussian avatar pipelines require extensive multiview and sequential expression data, limiting scalability and accessibility. In this work, we introduce Self-Adaptive Gaussian Expression (SAGE), a framework for self-learning expression-induced Gaussian deformations that enables high-fidelity, animatable avatars from minimal input data. Our method jointly optimizes 2D Gaussian surfels and a Signed Distance Field (SDF) to enforce compact, surface-aligned Gaussian distributions, while a self-supervised expression learning phase replaces long training sequences with geometric and appearance consistency constraints. This design allows flexible deployment across multiple reconstruction regimes: in the multiview setting, only a single frame (timestep) is required instead of thousands; in the monocular setting, only head rotations are needed without expression sequences; and in the one-shot setting, no pretraining or priors are necessary. Experiments demonstrate that our approach achieves reconstruction and animation quality comparable to state-of-the-art methods, while reducing data requirements by several orders of magnitude. Our results highlight the potential of self-supervised Gaussian deformation learning as a step toward accessible, data-efficient avatar creation.