IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

📅 2024-12-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses intrinsic image decomposition under arbitrary numbers of input views and lighting conditions. We propose the first end-to-end, multi-view consistent framework for joint geometry and material estimation. Methodologically, we design a cross-view–cross-domain attention mechanism that integrates diffusion-based priors with physically grounded rendering constraints; introduce illumination augmentation and view-adaptive training strategies; and construct ARB-Objaverse—the first large-scale multi-view, multi-illumination intrinsic dataset. Experiments demonstrate significant improvements over state-of-the-art methods in surface normal and material property estimation. Our approach achieves comprehensive qualitative and quantitative superiority on downstream tasks including single-image relighting, photometric stereo, and 3D reconstruction. It effectively mitigates the longstanding illumination–material ambiguity and multi-view inconsistency challenges.

Technology Category

Application Category

📝 Abstract
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other hand, learning-based approaches leverage rich material priors from existing 3D object datasets but face challenges with maintaining multi-view consistency. In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. Our method achieves accurate and multi-view consistent estimation on surface normals and material properties. This is made possible through a novel cross-view, cross-domain attention module and an illumination-augmented, view-adaptive training strategy. Additionally, we introduce ARB-Objaverse, a new dataset that provides large-scale multi-view intrinsic data and renderings under diverse lighting conditions, supporting robust training. Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.
Problem

Research questions and friction points this paper is trying to address.

Accurate intrinsic decomposition from arbitrary views and illuminations.
Overcoming ambiguities between lighting and material properties.
Ensuring multi-view consistency in geometry and material estimation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based model for intrinsic decomposition
Cross-view, cross-domain attention module
Illumination-augmented, view-adaptive training strategy
🔎 Similar Papers
No similar papers found.