Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

πŸ“… 2026-06-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of efficiently generating high-resolution, real-time videos with autoregressive video diffusion models, which are typically constrained to low resolutions such as 480p. The authors propose Ultra Flash, a cascaded streaming framework that integrates architecture-preserving T2V-to-TV2V super-resolution training, a causal streaming latent upsampler, and a high-resolution decoder. Combined with single-step distillation, dynamic cache management, and self-reinforced preference optimization, this approach achieves real-time inference at approximately 30 FPS for 1K and 18 FPS for 2K video generation on a single GPU. The method substantially improves spatiotemporal consistency and inference efficiency while maintaining state-of-the-art visual quality and scalability.
πŸ“ Abstract
While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions (e.g., 480P), leaving efficient, scalable, real-time high-resolution video generation a fundamental open challenge. To bridge this gap, we present Ultra Flash, a cascaded streaming framework capable of real-time high-resolution video generation. Ultra Flash achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU through three key contributions: (1) an architecture-preserving T2V-to-TV2V super-resolution training paradigm coupled with an AIGC-oriented data degradation pipeline that effectively preserves the generative capability of the base model, enabling enhanced high-resolution detail when cascaded after mainstream low-resolution generative models; (2) a causal streaming latent upsampler paired with a high-resolution decoder, which enhances spatiotemporal coherence while enabling efficient latent spatial scaling and precise high-resolution decoding with negligible computational overhead; and (3) a cascade high-resolution streaming video generation optimization scheme that first performs hybrid-reward-enhanced sparse causalization and single-step distillation of the super-resolution model, then introduces cascaded streaming self-forcing preference optimization with dynamic cache management, jointly enhancing overall coherence, improving quality, and enabling real-time high-resolution streaming video generation. Extensive experiments demonstrate that Ultra Flash reliably produces ultra-high-resolution streaming video while maintaining state-of-the-art visual quality and superior efficiency.
Problem

Research questions and friction points this paper is trying to address.

real-time video generation
high-resolution video
streaming video
scalable video synthesis
video diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

real-time streaming video generation
high-resolution video synthesis
cascaded diffusion models
causal latent upsampling
dynamic cache management
πŸ”Ž Similar Papers