A Scalable Attention-Based Approach for Image-to-3D Texture Mapping

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing single-image 3D texture generation methods rely on UV parameterization and differentiable rendering, resulting in slow inference and limited texture fidelity. This paper introduces the first end-to-end Transformer framework that directly predicts a 3D texture field from a single image—without requiring UV mapping or differentiable rendering. Our method employs a triplane feature representation to jointly model geometry-appearance correlations and introduces a depth-guided back-projection loss for efficient, supervision-compatible training. The architecture enables single-pass forward inference (0.2 seconds per image). Extensive qualitative evaluation, quantitative benchmarks, and user studies demonstrate consistent superiority over state-of-the-art methods, achieving significant gains in texture fidelity and visual quality. This work establishes a new paradigm for single-image texture reconstruction that simultaneously delivers high quality, high efficiency, and strong generalization.

Technology Category

Application Category

📝 Abstract

High-quality textures are critical for realistic 3D content creation, yet existing generative methods are slow, rely on UV maps, and often fail to remain faithful to a reference image. To address these challenges, we propose a transformer-based framework that predicts a 3D texture field directly from a single image and a mesh, eliminating the need for UV mapping and differentiable rendering, and enabling faster texture generation. Our method integrates a triplane representation with depth-based backprojection losses, enabling efficient training and faster inference. Once trained, it generates high-fidelity textures in a single forward pass, requiring only 0.2s per shape. Extensive qualitative, quantitative, and user preference evaluations demonstrate that our method outperforms state-of-the-art baselines on single-image texture reconstruction in terms of both fidelity to the input image and perceptual quality, highlighting its practicality for scalable, high-quality, and controllable 3D content creation.

Problem

Research questions and friction points this paper is trying to address.

Generating high-fidelity 3D textures from single images

Eliminating UV mapping and slow differentiable rendering

Achieving fast, scalable 3D content creation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based framework for direct 3D texture prediction

Triplane representation with depth-based backprojection losses

Single forward pass generation eliminating UV mapping

🔎 Similar Papers

No similar papers found.