Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenges of geometric inconsistency and lack of photorealism in multi-view human pose image generation from a single input image. We propose a dual-conditional diffusion framework that jointly leverages 3D-aware neural rendering and parametric human priors: Human NeRF provides geometrically consistent coarse multi-view renderings; the SMPL model extracts texture, normal, and semantic features, which are fused hierarchically to jointly optimize global structure and local details; finally, a diffusion model performs high-fidelity image refinement. Our method significantly outperforms existing approaches under complex poses, loose clothing, and occlusion scenarios. It achieves joint novel-view and novel-pose synthesis with precise geometry, sharp details, and strong cross-view consistency. The framework establishes a robust and efficient paradigm for single-image human editing, enabling controllable, high-quality, and geometrically faithful human image generation.

Technology Category

Application Category

📝 Abstract

The creation of lifelike human avatars capable of realistic pose variation and viewpoint flexibility remains a fundamental challenge in computer vision and graphics. Current approaches typically yield either geometrically inconsistent multi-view images or sacrifice photorealism, resulting in blurry outputs under diverse viewing angles and complex motions. To address these issues, we propose Blur2Sharp, a novel framework integrating 3D-aware neural rendering and diffusion models to generate sharp, geometrically consistent novel-view images from only a single reference view. Our method employs a dual-conditioning architecture: initially, a Human NeRF model generates geometrically coherent multi-view renderings for target poses, explicitly encoding 3D structural guidance. Subsequently, a diffusion model conditioned on these renderings refines the generated images, preserving fine-grained details and structural fidelity. We further enhance visual quality through hierarchical feature fusion, incorporating texture, normal, and semantic priors extracted from parametric SMPL models to simultaneously improve global coherence and local detail accuracy. Extensive experiments demonstrate that Blur2Sharp consistently surpasses state-of-the-art techniques in both novel pose and view generation tasks, particularly excelling under challenging scenarios involving loose clothing and occlusions.

Problem

Research questions and friction points this paper is trying to address.

Generates sharp, geometrically consistent novel-view human images from a single reference

Integrates 3D-aware neural rendering and diffusion models to refine outputs

Enhances visual quality using hierarchical feature fusion from parametric models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 3D-aware neural rendering and diffusion models

Uses dual-conditioning with Human NeRF and diffusion refinement

Enhances quality via hierarchical feature fusion from SMPL priors

🔎 Similar Papers

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings