🤖 AI Summary
This work addresses monocular video-driven, relightable, and animatable human avatar modeling. To overcome the real-time rendering bottleneck of implicit neural fields, we propose an efficient neural rendering framework. Our method distills knowledge from an implicit neural field into an explicit 2D Gaussian point cloud representation, enabling high-speed differentiable rasterization. We introduce part-level ambient occlusion (AO) probes to generate pixel-accurate dynamic shadows in a single forward pass. Additionally, we adopt a PBR-aware material decomposition and approximation strategy to ensure physically consistent relighting. Quantitatively, our approach achieves relighting quality on par with or superior to the teacher implicit model while operating at 67 FPS—370× faster than the original neural field. To our knowledge, this is the first method enabling high-fidelity, real-time, relightable, and animatable human rendering from monocular video input, making it suitable for interactive applications such as VR, sports analytics, and gaming.
📝 Abstract
Creating relightable and animatable human avatars from monocular videos is a rising research topic with a range of applications, e.g. virtual reality, sports, and video games. Previous works utilize neural fields together with physically based rendering (PBR), to estimate geometry and disentangle appearance properties of human avatars. However, one drawback of these methods is the slow rendering speed due to the expensive Monte Carlo ray tracing. To tackle this problem, we proposed to distill the knowledge from implicit neural fields (teacher) to explicit 2D Gaussian splatting (student) representation to take advantage of the fast rasterization property of Gaussian splatting. To avoid ray-tracing, we employ the split-sum approximation for PBR appearance. We also propose novel part-wise ambient occlusion probes for shadow computation. Shadow prediction is achieved by querying these probes only once per pixel, which paves the way for real-time relighting of avatars. These techniques combined give high-quality relighting results with realistic shadow effects. Our experiments demonstrate that the proposed student model achieves comparable or even better relighting results with our teacher model while being 370 times faster at inference time, achieving a 67 FPS rendering speed.