LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing methods for novel view synthesis in dynamic scenes, which typically rely on accurate camera parameters and time-consuming optimization, thereby precluding real-time applications such as live streaming. The authors propose a feed-forward framework for real-time novel view video generation that requires only two synchronized, uncalibrated, and sparsely sampled input video streams to reconstruct temporally consistent and stable novel views. The approach integrates a multi-view vision transformer for 3D reconstruction of keyframes and incorporates a diffusion transformer module for view interpolation and implicit camera pose estimation—eliminating the need for explicit calibration. Operating at 1024×768 resolution, the method processes each frame in just 0.07 seconds, achieving speedups of an order of magnitude over current optimization-based techniques and significantly advancing the practicality of real-time novel view synthesis.

📝 Abstract

Live-streaming Novel View Synthesis (NVS) from unposed multi-view video remains an open challenge in a wide range of applications. Existing methods for dynamic scene representation typically require ground-truth camera parameters and involve lengthy optimizations ($\approx 2.67$s), which makes them unsuitable for live streaming scenarios. To address this issue, we propose a novel viewpoint video live-streaming method (LiveStre4m), a feed-forward model for real-time NVS from unposed sparse multi-view inputs. LiveStre4m introduces a multi-view vision transformer for keyframe 3D scene reconstruction coupled with a diffusion-transformer interpolation module that ensures temporal consistency and stable streaming. In addition, a Camera Pose Predictor module is proposed to efficiently estimate both poses and intrinsics directly from RGB images, removing the reliance on known camera calibration information. Our approach enables temporally consistent novel-view video streaming in real-time using as few as two synchronized unposed input streams. LiveStre4m attains an average reconstruction time of $ 0.07$s per-frame at $ 1024 \times 768$ resolution, outperforming the optimization-based dynamic scene representation methods by orders of magnitude in runtime. These results demonstrate that LiveStre4m makes real-time NVS streaming feasible in practical settings, marking a substantial step toward deployable live novel-view synthesis systems. Code available at: https://github.com/pedro-quesado/LiveStre4m

Problem

Research questions and friction points this paper is trying to address.

Live-streaming

Novel View Synthesis

Unposed Multi-View Video

Dynamic Scene Representation

Real-time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel View Synthesis

Unposed Multi-View Video

Real-Time Streaming

Camera Pose Estimation

Diffusion Transformer

🔎 Similar Papers

No similar papers found.

Authors to Follow