🤖 AI Summary
This work addresses the challenge of generating artifact-free 3D character animations from a single-view, fixed-camera video by animating static neural point clouds and recovering reposable skeletal rigs. The proposed method, RigPAPR, integrates automatic rigging with proximity-aware point rendering (PAPR), eliminating the need for mesh proxies, pose refinement, or category-specific templates. Instead, it directly drives the point cloud via linear blend skinning. This approach effectively mitigates common artifacts such as gaps and bulges around joints, yielding cleaner articulation on both synthetic and real-world data. RigPAPR achieves a notable improvement in novel-view synthesis, surpassing existing state-of-the-art baselines by over 3 dB in PSNR while maintaining high visual fidelity.
📝 Abstract
Static neural point reconstructions capture a subject at high fidelity from posed images. Given such a reconstruction, we aim to animate it to follow a monocular fixed-viewpoint driving video of the subject, whether captured or produced by image-to-video (I2V) generation, and to recover a rigged, re-posable 3D asset. Existing methods deform Gaussian splats through direct linear blend skinning (LBS) or mesh proxies, both of which are prone to joint-boundary artifacts under articulation, even with per-primitive corrections. We trace the artifact to the representation: each splat carries an individual shape calibrated in the canonical pose to tile with its neighbours. Under rigid LBS, each splat moves with its bone but cannot bend, so the canonical tiling breaks at joint boundaries into gaps and spikes. Proximity attention point rendering (PAPR) instead carries no per-primitive shape; each pixel is recomposed at render time from the deformed primitives' positions, so the surface re-forms naturally with the articulation. We present RigPAPR, which auto-rigs a static PAPR cloud and drives it under direct LBS from a single fixed-viewpoint video, without mesh proxy, pose-dependent correction, or category template. On synthetic subjects, RigPAPR matches the strongest baseline at the supervised view and exceeds mesh-based and Gaussian-splatting baselines at novel views by 3+dB PSNR, with cleaner joint-boundary renderings of both synthetic and real subjects.