Lyra 2.0: Explorable Generative 3D Worlds

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing video generation models often suffer from spatial forgetting and temporal drift under long-horizon scenarios involving large camera motions or revisits to previously observed regions, leading to insufficient 3D consistency. To address this, this work proposes a camera-controllable video generation framework that integrates feedforward 3D reconstruction, a geometry-guided information routing mechanism, and dense correspondence matching to enable efficient inter-frame association and retrieval. Furthermore, a self-augmented historical training strategy is introduced to effectively mitigate spatial forgetting and temporal drift. The proposed approach substantially extends the length of explorable video trajectories while preserving 3D consistency and successfully fine-tunes a feedforward reconstruction model capable of stably recovering high-quality 3D scenes.

Technology Category

Application Category

📝 Abstract

Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative capacity of video models with 3D outputs ready for real-time rendering and simulation. Scaling to large, complex environments requires 3D-consistent video generation over long camera trajectories with large viewpoint changes and location revisits, a setting where current video models degrade quickly. Existing methods for long-horizon generation are fundamentally limited by two forms of degradation: spatial forgetting and temporal drifting. As exploration proceeds, previously observed regions fall outside the model's temporal context, forcing the model to hallucinate structures when revisited. Meanwhile, autoregressive generation accumulates small synthesis errors over time, gradually distorting scene appearance and geometry. We present Lyra 2.0, a framework for generating persistent, explorable 3D worlds at scale. To address spatial forgetting, we maintain per-frame 3D geometry and use it solely for information routing -- retrieving relevant past frames and establishing dense correspondences with the target viewpoints -- while relying on the generative prior for appearance synthesis. To address temporal drifting, we train with self-augmented histories that expose the model to its own degraded outputs, teaching it to correct drift rather than propagate it. Together, these enable substantially longer and 3D-consistent video trajectories, which we leverage to fine-tune feed-forward reconstruction models that reliably recover high-quality 3D scenes.

Problem

Research questions and friction points this paper is trying to address.

spatial forgetting

temporal drifting

3D-consistent video generation

long-horizon generation

generative reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D-consistent video generation

spatial forgetting

temporal drifting