Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generating controllable and interpretable fusion results in the absence of infrared modality by proposing a coefficient-domain fusion framework based on a shared convolutional dictionary. The method jointly learns a cross-modal shared atom space under infrared-deficient conditions, enabling high-quality reconstruction through visible-guided infrared coefficient inference and adaptive atom-level fusion. A large language model is incorporated as a weak semantic prior to facilitate closed-loop optimization. By integrating window-based attention with a convolutional hybrid mechanism, the proposed approach significantly enhances both the perceptual quality of fused images and downstream object detection performance.

Technology Category

Application Category

📝 Abstract
Infrared-visible (IR-VIS) image fusion is vital for perception and security, yet most methods rely on the availability of both modalities during training and inference. When the infrared modality is absent, pixel-space generative substitutes become hard to control and inherently lack interpretability. We address missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary. The pipeline comprises three key components: (1) Joint Shared-dictionary Representation Learning (JSRL) learns a unified and interpretable atom space shared by both IR and VIS modalities; (2) VIS-Guided IR Inference (VGII) transfers VIS coefficients to pseudo-IR coefficients in the coefficient domain and performs a one-step closed-loop refinement guided by a frozen large language model as a weak semantic prior; and (3) Adaptive Fusion via Representation Inference (AFRI) merges VIS structures and inferred IR cues at the atom level through window attention and convolutional mixing, followed by reconstruction with the shared dictionary. This encode-transfer-fuse-reconstruct pipeline avoids uncontrolled pixel-space generation while ensuring prior preservation within interpretable dictionary-coefficient representation. Experiments under missing-IR settings demonstrate consistent improvements in perceptual quality and downstream detection performance. To our knowledge, this represents the first framework that jointly learns a shared dictionary and performs coefficient-domain inference-fusion to tackle missing-IR fusion. The source code is publicly available at https://github.com/harukiv/DCMIF.
Problem

Research questions and friction points this paper is trying to address.

infrared-visible image fusion
missing modality
cross-modal fusion
interpretable representation
coefficient-domain inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

dictionary-guided fusion
coefficient-domain inference
missing modality
cross-modal image fusion
shared convolutional dictionary
🔎 Similar Papers
No similar papers found.