🤖 AI Summary
This work addresses the challenge that illumination variations drastically alter visual appearance and impair model generalization. Inspired by human visual mechanisms, it proposes a novel approach that explicitly incorporates illumination as a training signal within a contrastive learning framework. By leveraging controllable rendered data and an auxiliary illumination-aware objective, the method jointly models semantic consistency and illumination-structure dependencies. The resulting representations are sensitive to lighting conditions yet semantically stable. Experiments demonstrate consistent improvements over standard contrastive learning baselines on image classification and object detection across ImageNet, ExDark, and PASCAL VOC benchmarks. Moreover, the approach remains compatible with supervised learning paradigms and performs effectively even in simple illumination settings.
📝 Abstract
Variations in illumination remain a major challenge for visual representation learning, as they induce substantial appearance changes both across and within environments. While existing approaches typically address this issue through data augmentations that encourage models to become invariant to lighting changes, such strategies do not explicitly model lighting information during learning. Inspired by theories of human vision, we propose a lighting-aware representation learning framework that incorporates illumination variation as an explicit training signal rather than a nuisance factor to be suppressed. Our method extends contrastive learning by introducing an auxiliary objective that captures illumination-dependent variation in rendered scenes, enabling the model to jointly learn representations that preserve semantic consistency while remaining sensitive to lighting-dependent visual structure. We evaluate the proposed model on image classification and object detection tasks across the ImageNet, ExDark, and PASCAL VOC benchmarks. Results demonstrate that the proposed lighting-aware training consistently improves downstream performance over standard contrastive learning baselines, while maintaining the same architecture and training budget. Furthermore, our approach shows promising performance in supervised learning frameworks and under settings involving simpler lighting variation, suggesting broad applicability beyond complex illumination scenarios. These results indicate its potential to enhance model robustness and adaptability in complex visual environments as well as in more conventional image processing tasks.