Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

📅 2024-12-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address real-time free-viewpoint human rendering from sparse RGB video—characterized by limited sensors, strict latency constraints, and strong geometry-appearance coupling—this paper proposes a decoupled dual inverse-projection texture framework. First, an image-conditioned template deformation network explicitly disentangles coarse-grained geometry estimation from fine-grained appearance synthesis. Second, a two-stage texture inverse-projection mechanism, leveraging Gaussian splatting representations, enables efficient rendering within a 2D CNN-based texture space. The method significantly improves unseen-pose robustness and 4K photorealism. Quantitative and qualitative evaluations demonstrate consistent superiority over state-of-the-art methods: more accurate geometric alignment, reduced texture artifacts, and enhanced generalization—enabling real-time 4K free-viewpoint rendering.

Technology Category

Application Category

📝 Abstract
Real-time free-view human rendering from sparse-view RGB inputs is a challenging task due to the sensor scarcity and the tight time budget. To ensure efficiency, recent methods leverage 2D CNNs operating in texture space to learn rendering primitives. However, they either jointly learn geometry and appearance, or completely ignore sparse image information for geometry estimation, significantly harming visual quality and robustness to unseen body poses. To address these issues, we present Double Unprojected Textures, which at the core disentangles coarse geometric deformation estimation from appearance synthesis, enabling robust and photorealistic 4K rendering in real-time. Specifically, we first introduce a novel image-conditioned template deformation network, which estimates the coarse deformation of the human template from a first unprojected texture. This updated geometry is then used to apply a second and more accurate texture unprojection. The resulting texture map has fewer artifacts and better alignment with input views, which benefits our learning of finer-level geometry and appearance represented by Gaussian splats. We validate the effectiveness and efficiency of the proposed method in quantitative and qualitative experiments, which significantly surpasses other state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Real-time human rendering from sparse RGB videos
Disentangling geometry and appearance for robust rendering
Achieving photorealistic 4K rendering in real-time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Double Unprojected Textures for rendering
Disentangles geometry and appearance synthesis
Employs Gaussian splats for finer details
🔎 Similar Papers
2024-07-15IEEE Transactions on Visualization and Computer GraphicsCitations: 2