LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing methods for novel view synthesis in dynamic scenes, which typically rely on accurate camera parameters and time-consuming optimization, thereby precluding real-time applications such as live streaming. The authors propose a feed-forward framework for real-time novel view video generation that requires only two synchronized, uncalibrated, and sparsely sampled input video streams to reconstruct temporally consistent and stable novel views. The approach integrates a multi-view vision transformer for 3D reconstruction of keyframes and incorporates a diffusion transformer module for view interpolation and implicit camera pose estimation—eliminating the need for explicit calibration. Operating at 1024×768 resolution, the method processes each frame in just 0.07 seconds, achieving speedups of an order of magnitude over current optimization-based techniques and significantly advancing the practicality of real-time novel view synthesis.
📝 Abstract
Live-streaming Novel View Synthesis (NVS) from unposed multi-view video remains an open challenge in a wide range of applications. Existing methods for dynamic scene representation typically require ground-truth camera parameters and involve lengthy optimizations ($\approx 2.67$s), which makes them unsuitable for live streaming scenarios. To address this issue, we propose a novel viewpoint video live-streaming method (LiveStre4m), a feed-forward model for real-time NVS from unposed sparse multi-view inputs. LiveStre4m introduces a multi-view vision transformer for keyframe 3D scene reconstruction coupled with a diffusion-transformer interpolation module that ensures temporal consistency and stable streaming. In addition, a Camera Pose Predictor module is proposed to efficiently estimate both poses and intrinsics directly from RGB images, removing the reliance on known camera calibration information. Our approach enables temporally consistent novel-view video streaming in real-time using as few as two synchronized unposed input streams. LiveStre4m attains an average reconstruction time of $ 0.07$s per-frame at $ 1024 \times 768$ resolution, outperforming the optimization-based dynamic scene representation methods by orders of magnitude in runtime. These results demonstrate that LiveStre4m makes real-time NVS streaming feasible in practical settings, marking a substantial step toward deployable live novel-view synthesis systems. Code available at: https://github.com/pedro-quesado/LiveStre4m
Problem

Research questions and friction points this paper is trying to address.

Live-streaming
Novel View Synthesis
Unposed Multi-View Video
Dynamic Scene Representation
Real-time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel View Synthesis
Unposed Multi-View Video
Real-Time Streaming
Camera Pose Estimation
Diffusion Transformer
🔎 Similar Papers
No similar papers found.
P
Pedro Quesado
AIMSGroup, Department of Electrical Engineering, Eindhoven University of Technology
E
Erkut Akdag
AIMSGroup, Department of Electrical Engineering, Eindhoven University of Technology
Y
Yasaman Kashefbahrami
AIMSGroup, Department of Electrical Engineering, Eindhoven University of Technology
W
Willem Menu
AIMSGroup, Department of Electrical Engineering, Eindhoven University of Technology
Egor Bondarev
Egor Bondarev
Associate Professor, Eindhoven University of Technology
computer visionAI3D reconstructionreal-time architecturesanomaly detection