🤖 AI Summary
This paper addresses high-fidelity material reconstruction of 3D objects from single- or multi-view images and introduces the first end-to-end diffusion-based framework for this task. The method employs a two-stage progressive inference scheme: first predicting base material properties (e.g., BRDF parameters), then generating complete, unseen-view material maps via a novel view-material cross-attention (VMCA) mechanism that fuses input-view features. VMCA supports arbitrary numbers of input images without requiring auxiliary pre-trained models or geometric priors. Compared to existing approaches, our method achieves significant improvements in material accuracy, cross-view consistency, and visual fidelity. It establishes new state-of-the-art performance across multiple benchmarks while demonstrating strong generalization capability and cross-object stability.
📝 Abstract
Applying diffusion models to physically-based material estimation and generation has recently gained prominence. In this paper, we propose tt, a novel material reconstruction framework for 3D objects, offering the following advantages. First, tt adopts a two-stage reconstruction, starting with accurate material prediction from inputs and followed by prior-guided material generation for unobserved views, yielding high-fidelity results. Second, by utilizing progressive inference alongside the proposed view-material cross-attention (VMCA), tt enables reconstruction from an arbitrary number of input images, demonstrating strong scalability and flexibility. Finally, tt achieves both material prediction and generation capabilities through end-to-end optimization of a single diffusion model, without relying on additional pre-trained models, thereby exhibiting enhanced stability across various types of objects. Extensive experiments demonstrate that tt achieves superior performance in material reconstruction compared to existing methods.