GAT-NeRF: Geometry-Aware-Transformer Enhanced Neural Radiance Fields for High-Fidelity 4D Facial Avatars

📅 2025-10-03
🏛️ ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reconstructing high-fidelity 4D dynamic facial avatars from monocular videos remains challenging in recovering high-frequency details such as wrinkles. To address this, this work proposes the GAT-NeRF framework, which introduces a lightweight Geometry-Aware Transformer—augmented with explicit geometric priors—into the NeRF architecture for the first time. By integrating a coordinate-aligned MLP, the method effectively fuses 3D coordinates, 3DMM expression parameters, and learnable latent codes to enhance local geometric modeling. This approach significantly outperforms existing methods in visual fidelity and high-frequency detail recovery, enabling more realistic and fine-grained 4D facial reconstruction.

Technology Category

Application Category

📝 Abstract
High-fidelity 4D dynamic facial avatar reconstruction from monocular video is a critical yet challenging task, driven by increasing demands for immersive virtual human applications. While Neural Radiance Fields (NeRF) have advanced scene representation, their capacity to capture high-frequency facial details, such as dynamic wrinkles and subtle textures from information-constrained monocular streams, requires significant enhancement. To tackle this challenge, we propose a novel hybrid neural radiance field framework, called Geometry-Aware-Transformer Enhanced NeRF (GAT-NeRF) for high-fidelity and controllable 4D facial avatar reconstruction, which integrates the Transformer mechanism into the NeRF pipeline. GAT-NeRF synergistically combines a coordinate-aligned Multilayer Perceptron (MLP) with a lightweight Transformer module, termed as Geometry-Aware-Transformer (GAT) due to its processing of multi-modal inputs containing explicit geometric priors. The GAT module is enabled by fusing multi-modal input features, including 3D spatial coordinates, 3D Morphable Model (3DMM) expression parameters, and learnable latent codes to effectively learn and enhance feature representations pertinent to fine-grained geometry. The Transformer’s effective feature learning capabilities are leveraged to significantly augment the modeling of complex local facial patterns like dynamic wrinkles and acne scars. Comprehensive experiments unequivocally demonstrate GAT-NeRF’s state-of-the-art performance in visual fidelity and high-frequency detail recovery, forging new pathways for creating realistic dynamic digital humans for multimedia applications.
Problem

Research questions and friction points this paper is trying to address.

4D facial avatar
high-fidelity reconstruction
monocular video
dynamic wrinkles
Neural Radiance Fields
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Radiance Fields
Transformer
Geometry-Aware
4D Facial Avatar
High-Fidelity Reconstruction
🔎 Similar Papers
No similar papers found.
Z
Zhe Chang
Department of Control Science and Engineering, University of Shanghai for Science and Technology, China
H
Haodong Jin
Department of Control Science and Engineering, University of Shanghai for Science and Technology, China
Yan Song
Yan Song
University of Shanghai for Science and Technology
Model predictive controlMachine learning and data analysisImage processing and intelligent systems
Y
Ying Sun
Business School, University of Shanghai for Science and Technology, China
Hui Yu
Hui Yu
Professor of Visual and Cognitive Computing, University of Glasgow
Visual ComputingCognitive ComputingSocial RobotParallel Intelligence