MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing methods struggle to generate temporally coherent, long-horizon dynamic 3D content under multimodal 3D representations while supporting topological changes. This work proposes MORPHOS, a novel framework that introduces, for the first time, a unified 4D implicit representation—termed Temporally Structured Latent Variables (T-SLAT)—to jointly model 3D Gaussians, meshes, and radiance fields. Dynamic geometry and appearance are generated frame-by-frame through an autoregressive causal attention mechanism. To mitigate error accumulation over time, the method incorporates a temporal structure enhancement strategy. Extensive experiments demonstrate that MORPHOS achieves state-of-the-art performance in appearance generation across multiple benchmarks, delivers accurate geometric reconstruction, and exhibits strong cross-representation generalization and robustness in long-sequence generation.

📝 Abstract

We present MORPHOS, a novel autoregressive framework that generates dynamic 3D assets from videos across diverse representations, including meshes, 3D Gaussians, and radiance fields. Existing methods are typically limited to a single representation, struggle to model topological changes, or fail to maintain temporal consistency over long videos. To address these limitations, we introduce the Temporal Structured Latents (T-SLAT), a unified 4D representation that jointly encodes geometry and appearance along the temporal dimension. Leveraging T-SLAT, MORPHOS autoregressively generates dynamic 3D assets via causal attention, conditioning each frame on its preceding history to ensure temporal consistency while handling evolving topologies. We also propose a temporal-structural augmentation to mitigate error accumulation in autoregressive generation. MORPHOS achieves state-of-the-art performance in appearance and competitive results in geometry across multiple benchmarks, demonstrating superior generalization across various representations and robustness in long-horizon generation.

Problem

Research questions and friction points this paper is trying to address.

dynamic 3D generation

temporal consistency

topological changes

4D representation

autoregressive generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Structured Latents

autoregressive 4D generation

dynamic 3D assets