Topology Sculptor, Shape Refiner: Discrete Diffusion Model for High-Fidelity 3D Meshes Generation

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the low efficiency and poor geometric fidelity of autoregressive modeling in 3D mesh generation. We propose a parallel generative framework based on discrete diffusion models. Methodologically, we adopt a decoupled two-stage paradigm: (i) topology carving to produce plausible face-vertex sequences, followed by (ii) shape refinement to enhance geometric detail. Key technical innovations include an improved bidirectional-attention hourglass network, face- and vertex-level rotary position encoding (RoPE), and a connectivity-constrained loss to improve topological validity and spatial modeling accuracy. Experiments demonstrate that our method efficiently generates high-quality, artist-style 3D meshes with up to 10,000 faces and spatial resolution up to $1024^3$ on complex datasets. It achieves significant improvements in generation speed, topological consistency, and surface detail fidelity compared to prior approaches.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce Topology Sculptor, Shape Refiner (TSSR), a novel method for generating high-quality, artist-style 3D meshes based on Discrete Diffusion Models (DDMs). Our primary motivation for TSSR is to achieve highly accurate token prediction while enabling parallel generation, a significant advantage over sequential autoregressive methods. By allowing TSSR to "see" all mesh tokens concurrently, we unlock a new level of efficiency and control. We leverage this parallel generation capability through three key innovations: 1) Decoupled Training and Hybrid Inference, which distinctly separates the DDM-based generation into a topology sculpting stage and a subsequent shape refinement stage. This strategic decoupling enables TSSR to effectively capture both intricate local topology and overarching global shape. 2) An Improved Hourglass Architecture, featuring bidirectional attention enriched by face-vertex-sequence level Rotational Positional Embeddings (RoPE), thereby capturing richer contextual information across the mesh structure. 3) A novel Connection Loss, which acts as a topological constraint to further enhance the realism and fidelity of the generated meshes. Extensive experiments on complex datasets demonstrate that TSSR generates high-quality 3D artist-style meshes, capable of achieving up to 10,000 faces at a remarkable spatial resolution of $1024^3$. The code will be released at: https://github.com/psky1111/Tencent-TSSR.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality artist-style 3D meshes efficiently

Achieving parallel generation for accurate mesh token prediction

Capturing intricate local topology and global shape simultaneously

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled training and hybrid inference for topology and shape

Improved hourglass architecture with rotational positional embeddings

Novel connection loss enhances mesh realism and fidelity

🔎 Similar Papers

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner