🤖 AI Summary
To address severe noise, strong illumination interference, and significant detail loss in nighttime low-light image enhancement, this paper proposes an RGB-thermal infrared cross-modal fusion framework. The core methodological innovation is an RGB-thermal cross-attention mechanism that enables adaptive feature alignment and complementary enhancement between the two modalities. We further introduce V-TIEE—the first registered visible-thermal night-scene enhancement benchmark dataset—comprising 50 multi-scenario image pairs. Additionally, we design an end-to-end self-attention-driven fusion network. Joint training and evaluation on LLVIP and V-TIEE demonstrate consistent improvements: average PSNR increases by 1.8 dB and SSIM by 0.023 over state-of-the-art methods. All code and the V-TIEE dataset are publicly released.
📝 Abstract
In nighttime conditions, high noise levels and bright illumination sources degrade image quality, making low-light image enhancement challenging. Thermal images provide complementary information, offering richer textures and structural details. We propose RT-X Net, a cross-attention network that fuses RGB and thermal images for nighttime image enhancement. We leverage self-attention networks for feature extraction and a cross-attention mechanism for fusion to effectively integrate information from both modalities. To support research in this domain, we introduce the Visible-Thermal Image Enhancement Evaluation (V-TIEE) dataset, comprising 50 co-located visible and thermal images captured under diverse nighttime conditions. Extensive evaluations on the publicly available LLVIP dataset and our V-TIEE dataset demonstrate that RT-X Net outperforms state-of-the-art methods in low-light image enhancement. The code and the V-TIEE can be found here https://github.com/jhakrraman/rt-xnet.