🤖 AI Summary
Inverse rendering of indoor scenes from a single RGB image suffers from ill-posedness in jointly decomposing geometry, material, and lighting. To address this, we propose a diffusion-based inverse rendering framework. Our key innovation is a novel channel-wise noise scheduling mechanism that jointly optimizes both reconstruction fidelity and solution diversity within a unified model—overcoming the conventional trade-off between these competing objectives. Furthermore, we introduce a joint geometric-material-lighting modeling paradigm coupled with conditional sampling strategies to enable structured, controllable generation. Evaluated on standard benchmarks, our method achieves significant improvements over state-of-the-art approaches in both reconstruction accuracy and solution-space diversity. Moreover, it consistently enhances downstream applications such as object insertion and material editing.
📝 Abstract
We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generating diverse solutions can be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. The experimental results demonstrate the superiority of our two models in terms of both diversity and accuracy, which translates to enhanced performance in downstream applications such as object insertion and material editing.