PathRelax: Parallel-Path Relaxed Speculative Jacobi Decoding for Accelerating Auto-Regressive Text-to-Image Generation

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Autoregressive text-to-image models suffer from low inference efficiency due to excessively long token sequences required for high-resolution image generation. Existing acceleration methods are constrained by chain-structured draft mechanisms that offer limited search space and low acceptance rates. To address these limitations, this work proposes PathSpec, a parallel path-based speculative Jacobi decoding framework that introduces a novel multi-sequence draft tree structure to expand the token search space and incorporates a cross-path semantic relaxation verification mechanism to significantly improve acceptance rates. PathSpec seamlessly integrates with existing relaxed sampling techniques and achieves substantial speedups of 4.14×, 3.95×, and 4.18× on Parti-Prompts, MSCOCO2017, and T2ICompBench benchmarks, respectively, outperforming state-of-the-art relaxed sampling approaches.

📝 Abstract

The growing need for high-resolution image generation in autoregressive text-to-image models has resulted in extended token sequences, significantly increasing computational costs and inference times. However, existing state-of-the-art methods for accelerating autoregressive text-to-image models rely on chain-structured draft token sequences, leading to inefficient draft token search and limited acceptance lengths. To address this, we propose parallel-path cross-relaxed speculative Jacobi decoding (\textbf{PathSpec}), a novel framework that enhances efficiency through a multi-sequence draft tree structure. Our parallel-path speculative Jacobi decoding (\textbf{PathExplore}) expands the token search space, achieving a higher speedup ratio without sacrificing image quality. Additionally, we introduce cross-path relaxed verification (\textbf{PathRelax}) that exploits semantic similarities across sequences to further boost token acceptance rates. Evaluated on the Parti-Prompts, MSCOCO2017, and T2ICompBench datasets, our method achieves a speedup ratio of 4.14 $\times$, 3.95$\times$, and 4.18$\times$, respectively. Remarkably, PathExplore, without any relaxed sampling, outperforms relaxed sampling methods in the speedup ratio, such as GSD and LANTERN. Moreover, PathRelax's relaxation mechanism can be seamlessly integrated with other relaxation techniques, enabling further acceleration and providing an efficient solution for real-time text-to-image generation. Our code is available at https://github.com/Haodong-Lei-Ray/PathSpec.

Problem

Research questions and friction points this paper is trying to address.

autoregressive text-to-image generation

speculative decoding

token sequence acceleration

high-resolution image generation

inference latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

speculative decoding

parallel-path

relaxed verification