IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inconsistency of albedo, illumination, and reflectance components across views in multi-view intrinsic decomposition. We propose IDT—an end-to-end feed-forward Intrinsic Decomposition Transformer—that for the first time embeds a physically grounded three-factor imaging model (Lambertian diffuse + non-Lambertian specular/shadow) into a Transformer architecture. By jointly encoding multi-view inputs and leveraging cross-view attention mechanisms, IDT simultaneously estimates view-consistent diffuse albedo, diffuse shading, and specular shading—without iterative optimization. The design ensures strong interpretability and controllability. Evaluated on both synthetic and real-world datasets, IDT significantly improves cross-view consistency, yielding cleaner albedo maps, more coherent shading structures, and purer specular components. Our approach establishes a novel paradigm for multi-view intrinsic image decomposition.

Technology Category

Application Category

📝 Abstract
Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Recent diffusion-based methods have achieved strong results for single-view intrinsic decomposition; however, extending these approaches to multi-view settings remains challenging, often leading to severe view inconsistency. We propose extbf{Intrinsic Decomposition Transformer (IDT)}, a feed-forward framework for multi-view intrinsic image decomposition. By leveraging transformer-based attention to jointly reason over multiple input images, IDT produces view-consistent intrinsic factors in a single forward pass, without iterative generative sampling. IDT adopts a physically grounded image formation model that explicitly decomposes images into diffuse reflectance, diffuse shading, and specular shading. This structured factorization separates Lambertian and non-Lambertian light transport, enabling interpretable and controllable decomposition of material and illumination effects across views. Experiments on both synthetic and real-world datasets demonstrate that IDT achieves cleaner diffuse reflectance, more coherent diffuse shading, and better-isolated specular components, while substantially improving multi-view consistency compared to prior intrinsic decomposition methods.
Problem

Research questions and friction points this paper is trying to address.

Multi-view intrinsic decomposition for view consistency
Separating material and illumination effects across views
Achieving interpretable decomposition without iterative sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based attention for multi-view reasoning
Physically grounded image formation model for decomposition
Feed-forward framework without iterative generative sampling
🔎 Similar Papers
No similar papers found.
Kang Du
Kang Du
University of Utah
Causal InferenceDomain Generalization
Y
Yirui Guan
Tencent
Z
Zeyu Wang
The Hong Kong University of Science and Technology (Guangzhou), The Hong Kong University of Science and Technology