IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the inconsistency of albedo, illumination, and reflectance components across views in multi-view intrinsic decomposition. We propose IDT—an end-to-end feed-forward Intrinsic Decomposition Transformer—that for the first time embeds a physically grounded three-factor imaging model (Lambertian diffuse + non-Lambertian specular/shadow) into a Transformer architecture. By jointly encoding multi-view inputs and leveraging cross-view attention mechanisms, IDT simultaneously estimates view-consistent diffuse albedo, diffuse shading, and specular shading—without iterative optimization. The design ensures strong interpretability and controllability. Evaluated on both synthetic and real-world datasets, IDT significantly improves cross-view consistency, yielding cleaner albedo maps, more coherent shading structures, and purer specular components. Our approach establishes a novel paradigm for multi-view intrinsic image decomposition.

Technology Category

Application Category

📝 Abstract

Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Recent diffusion-based methods have achieved strong results for single-view intrinsic decomposition; however, extending these approaches to multi-view settings remains challenging, often leading to severe view inconsistency. We propose extbf{Intrinsic Decomposition Transformer (IDT)}, a feed-forward framework for multi-view intrinsic image decomposition. By leveraging transformer-based attention to jointly reason over multiple input images, IDT produces view-consistent intrinsic factors in a single forward pass, without iterative generative sampling. IDT adopts a physically grounded image formation model that explicitly decomposes images into diffuse reflectance, diffuse shading, and specular shading. This structured factorization separates Lambertian and non-Lambertian light transport, enabling interpretable and controllable decomposition of material and illumination effects across views. Experiments on both synthetic and real-world datasets demonstrate that IDT achieves cleaner diffuse reflectance, more coherent diffuse shading, and better-isolated specular components, while substantially improving multi-view consistency compared to prior intrinsic decomposition methods.

Problem

Research questions and friction points this paper is trying to address.

Multi-view intrinsic decomposition for view consistency

Separating material and illumination effects across views

Achieving interpretable decomposition without iterative sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based attention for multi-view reasoning

Physically grounded image formation model for decomposition

Feed-forward framework without iterative generative sampling

🔎 Similar Papers

No similar papers found.