🤖 AI Summary
Autoregressive 3D mesh generation suffers from slow inference due to sequential vertex/face token decoding, hindering interactive and large-scale applications. To address this, we propose a structured speculative decoding framework—Predict-Refine-Verify—that pioneers the adaptation of speculative decoding to mesh generation. Leveraging an hourglass Transformer architecture, our method exploits geometric priors to enable parallel multi-token prediction across face, vertex, and coordinate levels; a lightweight verification mechanism ensures output correctness. By breaking the temporal dependency bottleneck of autoregression, our approach achieves up to 2× faster inference while preserving high-fidelity reconstruction. Key contributions include: (1) the first structured speculative paradigm tailored for 3D mesh generation; (2) a hierarchical joint speculation strategy integrating geometric and topological correlations; and (3) a verifiable generation mechanism that jointly optimizes efficiency and quality.
📝 Abstract
Autoregressive models can generate high-quality 3D meshes by sequentially producing vertices and faces, but their token-by-token decoding results in slow inference, limiting practical use in interactive and large-scale applications. We present FlashMesh, a fast and high-fidelity mesh generation framework that rethinks autoregressive decoding through a predict-correct-verify paradigm. The key insight is that mesh tokens exhibit strong structural and geometric correlations that enable confident multi-token speculation. FlashMesh leverages this by introducing a speculative decoding scheme tailored to the commonly used hourglass transformer architecture, enabling parallel prediction across face, point, and coordinate levels. Extensive experiments show that FlashMesh achieves up to a 2 x speedup over standard autoregressive models while also improving generation fidelity. Our results demonstrate that structural priors in mesh data can be systematically harnessed to accelerate and enhance autoregressive generation.