π€ AI Summary
Existing image relighting methods struggle to effectively disentangle scene attributes from illumination on complex materials such as metal and glass, leading to performance limitations. This work uncovers the counterintuitive phenomenon that strong semantic priors can actually degrade relighting quality and proposes the Augmented Latent Intrinsics (ALI) framework. ALI integrates pixel-aligned visual features with implicit intrinsic representations to enhance photometric structure modeling while preserving semantic context. Leveraging a self-supervised strategy, the model can be trained using only unpaired real-world images without annotations. Experimental results demonstrate that the proposed method outperforms current state-of-the-art approaches overall, with particularly significant improvements on specular materials.
π Abstract
Image-to-image relighting requires representations that disentangle scene properties from illumination. Recent methods rely on latent intrinsic representations but remain under-constrained and often fail on challenging materials such as metal and glass. A natural hypothesis is that stronger pretrained visual priors should resolve these failures. We find the opposite: features from top-performing semantic encoders often degrade relighting quality, revealing a fundamental trade-off between semantic abstraction and photometric fidelity. We study this trade-off and introduce Augmented Latent Intrinsics (ALI), which balances semantic context and dense photometric structure by fusing features from a pixel-aligned visual encoder into a latent-intrinsic framework, together with a self-supervised refinement strategy to mitigate the scarcity of paired real-world data. Trained only on unlabeled real-world image pairs and paired with a dense, pixel-aligned visual prior, ALI achieves strong improvements in relighting, with the largest gains on complex, specular materials. Project page: https:\\augmented-latent-intrinsics.github.io