Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the challenge of balancing rendering speed, model size, and performance under sparse input views—a key limitation for deploying novel view synthesis methods on resource-constrained devices. The authors propose a new approach based on Multi-Plane Image (MPI) representation, leveraging depth maps predicted by vision foundation models for geometric initialization. A one-step diffusion mechanism is introduced to jointly optimize the MPI representation in a differentiable manner and enhance the final rendered output. This design significantly improves scene completeness and visual fidelity from sparse views. Compared to representative 3D Gaussian splatting methods, the proposed method achieves a 30.7% faster inference speed and reduces model size to only 14.8% of the baseline, while delivering competitive synthesis quality in forward-facing scenes.

📝 Abstract

Recently, novel view synthesis has witnessed remarkable progress, with mainstream methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) delivering impressive results. However, these approaches often struggle to balance rendering speed and model size, and their optimization-based training can be highly time-consuming. Furthermore, they typically rely on dense observations, often failing to produce satisfactory results under sparse-view conditions. Although feed-forward reconstruction significantly reduces the optimization time of 3DGS, its pixel-aligned formulation generates millions of Gaussians from a single image, severely limiting its practical deployment on mobile devices. To address these limitations, we revisit the Multiplane Image(MPI) representation, which represents scenes using a compact set of planar layers for efficient novel view synthesis. Leveraging recent advances in visual foundation models, we utilize predicted point maps for reliable geometric initialization, followed by differentiable optimization. To address the issues of holes and artifacts in sparsely initialized MPI, we introduce one-step diffusion, which participates in both the differentiable optimization of MPI and the postprocessing of rendering results. Compared with a representative GS-based method, our approach is 30.7% faster and uses only 14.8% of its model size, while achieving competitive synthesis quality on front-view scenarios

Problem

Research questions and friction points this paper is trying to address.

novel view synthesis

sparse-view

model size

rendering speed

mobile deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiplane Image

Differentiable Optimization

One-step Diffusion