🤖 AI Summary
This work addresses the low efficiency and poor geometric fidelity of autoregressive modeling in 3D mesh generation. We propose a parallel generative framework based on discrete diffusion models. Methodologically, we adopt a decoupled two-stage paradigm: (i) topology carving to produce plausible face-vertex sequences, followed by (ii) shape refinement to enhance geometric detail. Key technical innovations include an improved bidirectional-attention hourglass network, face- and vertex-level rotary position encoding (RoPE), and a connectivity-constrained loss to improve topological validity and spatial modeling accuracy. Experiments demonstrate that our method efficiently generates high-quality, artist-style 3D meshes with up to 10,000 faces and spatial resolution up to $1024^3$ on complex datasets. It achieves significant improvements in generation speed, topological consistency, and surface detail fidelity compared to prior approaches.
📝 Abstract
In this paper, we introduce Topology Sculptor, Shape Refiner (TSSR), a novel method for generating high-quality, artist-style 3D meshes based on Discrete Diffusion Models (DDMs). Our primary motivation for TSSR is to achieve highly accurate token prediction while enabling parallel generation, a significant advantage over sequential autoregressive methods. By allowing TSSR to "see" all mesh tokens concurrently, we unlock a new level of efficiency and control. We leverage this parallel generation capability through three key innovations: 1) Decoupled Training and Hybrid Inference, which distinctly separates the DDM-based generation into a topology sculpting stage and a subsequent shape refinement stage. This strategic decoupling enables TSSR to effectively capture both intricate local topology and overarching global shape. 2) An Improved Hourglass Architecture, featuring bidirectional attention enriched by face-vertex-sequence level Rotational Positional Embeddings (RoPE), thereby capturing richer contextual information across the mesh structure. 3) A novel Connection Loss, which acts as a topological constraint to further enhance the realism and fidelity of the generated meshes. Extensive experiments on complex datasets demonstrate that TSSR generates high-quality 3D artist-style meshes, capable of achieving up to 10,000 faces at a remarkable spatial resolution of $1024^3$. The code will be released at: https://github.com/psky1111/Tencent-TSSR.