DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

📅 2024-12-15

🏛️ Computer Vision and Pattern Recognition

📈 Citations: 10

✨ Influential: 2

career value

181K/year

🤖 AI Summary

Existing video diffusion models struggle to generate high-resolution, arbitrarily aspect-ratio 360° panoramic videos due to limitations in spatial scalability and boundary consistency. To address this, we propose the first seamless, scalable diffusion framework tailored for panoramic scenes. Our method introduces an Offset Shifting Denoiser coupled with a rotational sliding-window mechanism, enabling a fixed-resolution model to perform temporally synchronized and spatially consistent denoising across arbitrarily sized equirectangular videos. We further design a Global Motion Guidance module to jointly optimize local detail fidelity and global motion coherence. Additionally, we integrate a spatiotemporal modulation architecture with a training-free inference-time scaling strategy, supporting zero-shot, real-time generation under constant GPU memory. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods in both visual quality and motion continuity.

Technology Category

Application Category

📝 Abstract

The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360° panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution. Project page is available at https://dynamicscaler.pages.dev/new.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality panoramic video for immersive applications

Overcoming resolution and aspect ratio limitations in video synthesis

Ensuring coherence across scalable panoramic dynamic scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offset Shifting Denoiser enables coherent panoramic video denoising

Seamless rotating Window ensures boundary transitions and consistency

Global Motion Guidance maintains detail fidelity and motion continuity

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling