Continuous Speculative Decoding for Autoregressive Image Generation

📅 2024-11-18
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
Continuous autoregressive visual generation models suffer from high inference latency, while existing speculative decoding methods are restricted to discrete token spaces and lack theoretical foundations or practical techniques for continuous-valued outputs. Method: This work pioneers the extension of speculative decoding to continuous visual generation. We propose a diffusion-prior-based continuous acceptance criterion, design a denoising trajectory alignment mechanism and token pre-filling strategy to mitigate distribution mismatch, and establish a continuous accept-reject sampling framework with analytically derived upper bounds on approximation error. Contribution/Results: Our approach achieves a 2.33× inference speedup on standard diffusion-based autoregressive models while provably preserving the exact output distribution of the original model. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts, showcasing considerable reconstruction quality and higher generation fidelity. However, the computational demands of the autoregressive framework result in significant inference overhead. While speculative decoding has proven effective in accelerating Large Language Models (LLMs), their adaptation to continuous-valued visual autoregressive models remains unexplored. This work generalizes the speculative decoding algorithm from discrete tokens to continuous space. By analyzing the intrinsic properties of output distribution, we establish a tailored acceptance criterion for the diffusion distributions prevalent in such models. To overcome the inconsistency that occurred in speculative decoding output distributions, we introduce denoising trajectory alignment and token pre-filling methods. Additionally, we identify the hard-to-sample distribution in the rejection phase. To mitigate this issue, we propose a meticulous acceptance-rejection sampling method with a proper upper bound, thereby circumventing complex integration. Experimental results show that our continuous speculative decoding achieves a remarkable $2.33 imes$ speed-up on off-the-shelf models while maintaining the output distribution. Codes will be available at https://github.com/MarkXCloud/CSpD
Problem

Research questions and friction points this paper is trying to address.

Accelerating continuous autoregressive models for image generation
Overcoming low acceptance rates in speculative decoding
Solving modified distribution without analytic expressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous speculative decoding for autoregressive image generation
Denoising trajectory alignment and token pre-filling strategies
Acceptance-rejection sampling with appropriate upper bound
🔎 Similar Papers
No similar papers found.