Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of multi-step denoising in autoregressive video diffusion models, where existing training-free acceleration methods are constrained by a binary choice between caching and recomputation, struggling to efficiently handle intermediate cases and leading to redundancy through uniform treatment of effective frames under asynchronous scheduling. To overcome these limitations, we propose SCOPE, a training-free framework that introduces a tri-modal scheduling strategy—caching, prediction, and recomputation—augmented with selective computation. Our approach bridges the gap between caching and recomputation via Taylor extrapolation at the noise level and ensures stability through error propagation analysis. SCOPE is the first to integrate tri-modal scheduling with an extrapolation-based prediction mechanism, achieving up to 4.73× acceleration on MAGI-1 and SkyReels-V2 while preserving original video quality and outperforming all existing training-free baselines.
📝 Abstract
Autoregressive (AR) video diffusion models enable long-form video generation but remain expensive due to repeated multi-step denoising. Existing training-free acceleration methods rely on binary cache-or-recompute decisions, overlooking intermediate cases where direct reuse is too coarse yet full recomputation is unnecessary. Moreover, asynchronous AR schedules assign different noise levels to co-generated frames, yet existing methods process the entire valid interval uniformly. To address these AR-specific inefficiencies, we present SCOPE, a training-free framework for efficient AR video diffusion. SCOPE introduces a tri-modal scheduler over cache, predict, and recompute, where prediction via noise-level Taylor extrapolation fills the gap between reuse and recomputation with explicit stability controls backed by error propagation analysis. It further introduces selective computation that restricts execution to the active frame interval. On MAGI-1 and SkyReels-V2, SCOPE achieves up to 4.73x speedup while maintaining quality comparable to the original output, outperforming all training-free baselines.
Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation
diffusion models
computational efficiency
selective computation
noise-level scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Computation
Predictive Extrapolation
Autoregressive Video Diffusion
Training-Free Acceleration
Tri-Modal Scheduler
🔎 Similar Papers
No similar papers found.
H
Hanshuai Cui
Institute of Artificial Intelligence and Future Networks, Beijing Normal University, Zhuhai 519087, China
Zhiqing Tang
Zhiqing Tang
Associate Professor, Beijing Normal University
Edge ComputingEdge AI SystemsContainerReinforcement Learning
Z
Zhi Yao
Institute of Artificial Intelligence and Future Networks, Beijing Normal University, Zhuhai 519087, China
F
Fanshuai Meng
Institute of Artificial Intelligence and Future Networks, Beijing Normal University, Zhuhai 519087, China
Weijia Jia
Weijia Jia
FIEEE, Chair Professor, Beijing Normal University and UIC
Cyber Intelligent ComputingNetworking
Wei Zhao
Wei Zhao
Shenzhen University of Advanced Technology
cyber securityreal-time communicationsreal-time systemsInternet of thingscomputer science