GAT-NeRF: Geometry-Aware-Transformer Enhanced Neural Radiance Fields for High-Fidelity 4D Facial Avatars

📅 2025-10-03

🏛️ ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Reconstructing high-fidelity 4D dynamic facial avatars from monocular videos remains challenging in recovering high-frequency details such as wrinkles. To address this, this work proposes the GAT-NeRF framework, which introduces a lightweight Geometry-Aware Transformer—augmented with explicit geometric priors—into the NeRF architecture for the first time. By integrating a coordinate-aligned MLP, the method effectively fuses 3D coordinates, 3DMM expression parameters, and learnable latent codes to enhance local geometric modeling. This approach significantly outperforms existing methods in visual fidelity and high-frequency detail recovery, enabling more realistic and fine-grained 4D facial reconstruction.

Technology Category

Application Category

📝 Abstract

High-fidelity 4D dynamic facial avatar reconstruction from monocular video is a critical yet challenging task, driven by increasing demands for immersive virtual human applications. While Neural Radiance Fields (NeRF) have advanced scene representation, their capacity to capture high-frequency facial details, such as dynamic wrinkles and subtle textures from information-constrained monocular streams, requires significant enhancement. To tackle this challenge, we propose a novel hybrid neural radiance field framework, called Geometry-Aware-Transformer Enhanced NeRF (GAT-NeRF) for high-fidelity and controllable 4D facial avatar reconstruction, which integrates the Transformer mechanism into the NeRF pipeline. GAT-NeRF synergistically combines a coordinate-aligned Multilayer Perceptron (MLP) with a lightweight Transformer module, termed as Geometry-Aware-Transformer (GAT) due to its processing of multi-modal inputs containing explicit geometric priors. The GAT module is enabled by fusing multi-modal input features, including 3D spatial coordinates, 3D Morphable Model (3DMM) expression parameters, and learnable latent codes to effectively learn and enhance feature representations pertinent to fine-grained geometry. The Transformer’s effective feature learning capabilities are leveraged to significantly augment the modeling of complex local facial patterns like dynamic wrinkles and acne scars. Comprehensive experiments unequivocally demonstrate GAT-NeRF’s state-of-the-art performance in visual fidelity and high-frequency detail recovery, forging new pathways for creating realistic dynamic digital humans for multimedia applications.

Problem

Research questions and friction points this paper is trying to address.

4D facial avatar

high-fidelity reconstruction

monocular video

dynamic wrinkles

Neural Radiance Fields

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Radiance Fields

Transformer

Geometry-Aware