UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based relighting methods operate in semantic latent spaces, often inducing physical inconsistencies—such as overexposed highlights, misaligned shadows, and occlusion errors. This work proposes a geometry-aware, RGB-space flow-matching framework for fast, unified image and video relighting, achieving both physical plausibility and high visual fidelity within few sampling steps. Key contributions include: (1) a structured six-dimensional illumination annotation protocol; (2) LumosBench, a decoupled attribute evaluation benchmark; (3) the first vision-language model (VLM)-based automatic and interpretable relighting accuracy assessment method; and (4) joint supervision from depth and surface normal maps alongside path-consistency learning. Experiments demonstrate substantial improvements in physical consistency over state-of-the-art methods, with a 20× speedup in inference latency.

Technology Category

Application Category

📝 Abstract
Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results, such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting. Code is available at https://github.com/alibaba-damo-academy/Lumos-Custom.
Problem

Research questions and friction points this paper is trying to address.

Addresses unrealistic lighting effects in image relighting
Enhances physical plausibility through geometry feedback integration
Achieves faster processing while maintaining high-quality output
Innovation

Methods, ideas, or system contributions that make the work stand out.

RGB-space geometry feedback in flow matching backbone
Path consistency learning for few-step training
Structured six-dimensional annotation for fine-grained control
🔎 Similar Papers
No similar papers found.
R
Ropeway Liu
Zhejiang University, DAMO Academy, Alibaba Group
Hangjie Yuan
Hangjie Yuan
Alibaba DAMO | ZJU | MMLab@NTU
Generative ModelsMultimodal ModelsFoundation ModelsVideo Understanding
B
Bo Dong
DAMO Academy, Alibaba Group, Hupan Lab
Jiazheng Xing
Jiazheng Xing
Zhejiang University
Generative AIVideo UnderstandingRepresentation Learning
Jinwang Wang
Jinwang Wang
DAMO Academy, Alibaba Group, Hupan Lab, Zhejiang University
R
Rui Zhao
National University of Singapore
Y
Yan Xing
DAMO Academy, Alibaba Group, Hupan Lab
Weihua Chen
Weihua Chen
Alibaba DAMO Academy, previously NLPR, CASIA
Computer Vision
F
Fan Wang
DAMO Academy, Alibaba Group