A Scalable Attention-Based Approach for Image-to-3D Texture Mapping

๐Ÿ“… 2025-09-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing single-image 3D texture generation methods rely on UV parameterization and differentiable rendering, resulting in slow inference and limited texture fidelity. This paper introduces the first end-to-end Transformer framework that directly predicts a 3D texture field from a single imageโ€”without requiring UV mapping or differentiable rendering. Our method employs a triplane feature representation to jointly model geometry-appearance correlations and introduces a depth-guided back-projection loss for efficient, supervision-compatible training. The architecture enables single-pass forward inference (0.2 seconds per image). Extensive qualitative evaluation, quantitative benchmarks, and user studies demonstrate consistent superiority over state-of-the-art methods, achieving significant gains in texture fidelity and visual quality. This work establishes a new paradigm for single-image texture reconstruction that simultaneously delivers high quality, high efficiency, and strong generalization.

Technology Category

Application Category

๐Ÿ“ Abstract
High-quality textures are critical for realistic 3D content creation, yet existing generative methods are slow, rely on UV maps, and often fail to remain faithful to a reference image. To address these challenges, we propose a transformer-based framework that predicts a 3D texture field directly from a single image and a mesh, eliminating the need for UV mapping and differentiable rendering, and enabling faster texture generation. Our method integrates a triplane representation with depth-based backprojection losses, enabling efficient training and faster inference. Once trained, it generates high-fidelity textures in a single forward pass, requiring only 0.2s per shape. Extensive qualitative, quantitative, and user preference evaluations demonstrate that our method outperforms state-of-the-art baselines on single-image texture reconstruction in terms of both fidelity to the input image and perceptual quality, highlighting its practicality for scalable, high-quality, and controllable 3D content creation.
Problem

Research questions and friction points this paper is trying to address.

Generating high-fidelity 3D textures from single images
Eliminating UV mapping and slow differentiable rendering
Achieving fast, scalable 3D content creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based framework for direct 3D texture prediction
Triplane representation with depth-based backprojection losses
Single forward pass generation eliminating UV mapping
๐Ÿ”Ž Similar Papers
No similar papers found.