Continuous Speculative Decoding for Autoregressive Image Generation

📅 2024-11-18

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Continuous autoregressive visual generation models suffer from high inference latency, while existing speculative decoding methods are restricted to discrete token spaces and lack theoretical foundations or practical techniques for continuous-valued outputs. Method: This work pioneers the extension of speculative decoding to continuous visual generation. We propose a diffusion-prior-based continuous acceptance criterion, design a denoising trajectory alignment mechanism and token pre-filling strategy to mitigate distribution mismatch, and establish a continuous accept-reject sampling framework with analytically derived upper bounds on approximation error. Contribution/Results: Our approach achieves a 2.33× inference speedup on standard diffusion-based autoregressive models while provably preserving the exact output distribution of the original model. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts, showcasing considerable reconstruction quality and higher generation fidelity. However, the computational demands of the autoregressive framework result in significant inference overhead. While speculative decoding has proven effective in accelerating Large Language Models (LLMs), their adaptation to continuous-valued visual autoregressive models remains unexplored. This work generalizes the speculative decoding algorithm from discrete tokens to continuous space. By analyzing the intrinsic properties of output distribution, we establish a tailored acceptance criterion for the diffusion distributions prevalent in such models. To overcome the inconsistency that occurred in speculative decoding output distributions, we introduce denoising trajectory alignment and token pre-filling methods. Additionally, we identify the hard-to-sample distribution in the rejection phase. To mitigate this issue, we propose a meticulous acceptance-rejection sampling method with a proper upper bound, thereby circumventing complex integration. Experimental results show that our continuous speculative decoding achieves a remarkable $2.33 imes$ speed-up on off-the-shelf models while maintaining the output distribution. Codes will be available at https://github.com/MarkXCloud/CSpD

Problem

Research questions and friction points this paper is trying to address.

Accelerating continuous autoregressive models for image generation

Overcoming low acceptance rates in speculative decoding

Solving modified distribution without analytic expressions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous speculative decoding for autoregressive image generation

Denoising trajectory alignment and token pre-filling strategies

Acceptance-rejection sampling with appropriate upper bound

🔎 Similar Papers

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding