🤖 AI Summary
This work addresses the inconsistency of albedo, illumination, and reflectance components across views in multi-view intrinsic decomposition. We propose IDT—an end-to-end feed-forward Intrinsic Decomposition Transformer—that for the first time embeds a physically grounded three-factor imaging model (Lambertian diffuse + non-Lambertian specular/shadow) into a Transformer architecture. By jointly encoding multi-view inputs and leveraging cross-view attention mechanisms, IDT simultaneously estimates view-consistent diffuse albedo, diffuse shading, and specular shading—without iterative optimization. The design ensures strong interpretability and controllability. Evaluated on both synthetic and real-world datasets, IDT significantly improves cross-view consistency, yielding cleaner albedo maps, more coherent shading structures, and purer specular components. Our approach establishes a novel paradigm for multi-view intrinsic image decomposition.
📝 Abstract
Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Recent diffusion-based methods have achieved strong results for single-view intrinsic decomposition; however, extending these approaches to multi-view settings remains challenging, often leading to severe view inconsistency. We propose extbf{Intrinsic Decomposition Transformer (IDT)}, a feed-forward framework for multi-view intrinsic image decomposition. By leveraging transformer-based attention to jointly reason over multiple input images, IDT produces view-consistent intrinsic factors in a single forward pass, without iterative generative sampling. IDT adopts a physically grounded image formation model that explicitly decomposes images into diffuse reflectance, diffuse shading, and specular shading. This structured factorization separates Lambertian and non-Lambertian light transport, enabling interpretable and controllable decomposition of material and illumination effects across views. Experiments on both synthetic and real-world datasets demonstrate that IDT achieves cleaner diffuse reflectance, more coherent diffuse shading, and better-isolated specular components, while substantially improving multi-view consistency compared to prior intrinsic decomposition methods.