3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mobile thermal imaging suffers from low spatial resolution and blurred texture due to sensor miniaturization; existing single-image super-resolution (SR) methods fail to recover fine details, while RGB-guided approaches rely on cumbersome and error-prone cross-camera calibration. To address this, we propose a calibration-free, multi-camera cross-modal diffusion SR framework. Our method is the first to embed a cross-modal self-attention mechanism into a diffusion-based U-Net, enabling unsupervised feature alignment between RGB and thermal modalities. By leveraging generative priors, it fuses multi-view RGB information to jointly optimize thermal image reconstruction in an end-to-end manner. Extensive experiments on real mobile devices and public benchmarks demonstrate significant improvements in PSNR and SSIM, superior visual quality, and enhanced downstream thermal object detection and segmentation performance—achieving state-of-the-art results.

Technology Category

Application Category

📝 Abstract
The miniaturization of thermal sensors for mobile platforms inherently limits their spatial resolution and textural fidelity, leading to blurry and less informative images. Existing thermal super-resolution (SR) methods can be grouped into single-image and RGB-guided approaches: the former struggles to recover fine structures from limited information, while the latter relies on accurate and laborious cross-camera calibration, which hinders practical deployment and robustness. Here, we propose 3M-TI, a calibration-free Multi-camera cross-Modality diffusion framework for Mobile Thermal Imaging. At its core, 3M-TI integrates a cross-modal self-attention module (CSM) into the diffusion UNet, replacing the original self-attention layers to adaptively align thermal and RGB features throughout the denoising process, without requiring explicit camera calibration. This design enables the diffusion network to leverage its generative prior to enhance spatial resolution, structural fidelity, and texture detail in the super-resolved thermal images. Extensive evaluations on real-world mobile thermal cameras and public benchmarks validate our superior performance, achieving state-of-the-art results in both visual quality and quantitative metrics. More importantly, the thermal images enhanced by 3M-TI lead to substantial gains in critical downstream tasks like object detection and segmentation, underscoring its practical value for robust mobile thermal perception systems. More materials: https://github.com/work-submit/3MTI.
Problem

Research questions and friction points this paper is trying to address.

Enhancing low-resolution thermal images from mobile sensors without calibration
Overcoming limitations of single-image and RGB-guided super-resolution methods
Improving thermal image quality for object detection and segmentation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibration-free multi-camera cross-modal diffusion framework
Cross-modal self-attention module replaces original diffusion layers
Adaptively aligns thermal-RGB features during denoising process
🔎 Similar Papers
No similar papers found.
M
Minchong Chen
X
Xiaoyun Yuan
J
Junzhe Wan
Jianing Zhang
Jianing Zhang
Purdue University
Federated LearningMultiple Agent SystemsDifferential Privacy
J
Jun Zhang