🤖 AI Summary
To address the scarcity of thermal imaging data under challenging conditions—such as nighttime, fog, and dust—which severely limits perception capabilities for autonomous navigation, this paper proposes an RGB-to-thermal image synthesis method leveraging collaborative modeling via a conditional diffusion framework and self-attention mechanisms. For the first time, self-attention is integrated into the conditional diffusion process to explicitly model object-specific thermal radiation properties and cross-modal semantic alignment, enabling high-fidelity, physically consistent thermal image generation. Crucially, the method operates without paired ground-truth thermal images, requiring only input RGB images to synthesize high-quality thermal counterparts. The generated thermal images seamlessly integrate into mainstream multimodal autonomous driving datasets (e.g., nuScenes, KITTI). Experimental results demonstrate substantial performance gains on downstream thermal perception tasks, establishing a reliable, cost-effective data augmentation paradigm for deploying thermal sensing in resource-constrained autonomous systems.
📝 Abstract
Autonomous systems rely on sensors to estimate the environment around them. However, cameras, LiDARs, and RADARs have their own limitations. In nighttime or degraded environments such as fog, mist, or dust, thermal cameras can provide valuable information regarding the presence of objects of interest due to their heat signature. They make it easy to identify humans and vehicles that are usually at higher temperatures compared to their surroundings. In this paper, we focus on the adaptation of thermal cameras for robotics and automation, where the biggest hurdle is the lack of data. Several multi-modal datasets are available for driving robotics research in tasks such as scene segmentation, object detection, and depth estimation, which are the cornerstone of autonomous systems. However, they are found to be lacking in thermal imagery. Our paper proposes a solution to augment these datasets with synthetic thermal data to enable widespread and rapid adaptation of thermal cameras. We explore the use of conditional diffusion models to convert existing RGB images to thermal images using self-attention to learn the thermal properties of real-world objects.