Cross-Modality Feature Fusion Based on Structured State Space Duality for Multimodal Image Registration Network

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

232K/year
🤖 AI Summary
This work addresses the challenge of extracting shared structural information in multimodal image registration by proposing RegNetMamba-2, which introduces Structured State Space Duality (SSD) into a coarse-to-fine registration framework for the first time. By leveraging SSD to model both local and global structural features, and incorporating a cross-modal interaction (CMI) module alongside a progressive multi-scale fusion (MSF) mechanism, the method achieves highly efficient and accurate feature alignment. Extensive experiments on multiple benchmarks—including VIS-SAR, VIS-IR, and VIS-NIR—demonstrate that RegNetMamba-2 significantly outperforms existing deep learning-based approaches, setting new state-of-the-art results in both registration accuracy and inference efficiency.
📝 Abstract
In multi-modal image registration, the primary challenge lies in shared structural information extraction. Compared to Transformers, Structured State Space Duality (SSD) offers greater global structural feature extraction with higher efficiency during training and inference. Inspired by these advantages, we propose a novel algorithm for multi-modal image registration, named RegNetMamba-2. Our algorithm incorporates SSD into coarse-to-fine matching process to extract local and global structural features effectively. Firstly, SSD is applied in three different scales for multi-modal feature extraction in our network. To strengthen local representation, we pay more attention on foreground edge and structural information by feature scaling function of SSD. Secondly, for shared feature extraction of input images and multi-modal feature fusion in all scales, we propose cross-modality feature fusion model based on SSD, consisting of Cross-Modality feature Interaction (CMI) module and Multi-Scale feature Fusion (MSF) module. CMI module is designed for cross-modality feature extraction of each scale by SSD in cross form. MSF module is designed to employ a progressive upward fusion in feature-level to obtain fine features, consisting of multi-modal features in all scales. Following coarse-to-fine, the features in 1/8 scale from CMI and 1/2 scale from MSF are collected to calculate matching probability scores. Then we respectively establish matching process by correspondences of pixel-wise. Extensive experiments demonstrate that comparing with state-of-the-art deep-learning based algorithms, RegNetMamba-2 has achieved good effects in both performance and efficiency for multi-modal image registration on the following datasets: VIS-SAR (OSDataset), VIS-IR (LGHD/RoadSence) and VIS-NIR (RGB-NIR sense).
Problem

Research questions and friction points this paper is trying to address.

multimodal image registration
shared structural information
cross-modality feature fusion
feature extraction
image alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured State Space Duality
Cross-Modality Feature Fusion
Multimodal Image Registration
Coarse-to-Fine Matching
Multi-Scale Feature Fusion
🔎 Similar Papers
2024-08-01Workshop on Biomedical Image RegistrationCitations: 2
Z
Zhikang Li
Remote Sensing Image Processing and Fusion Group, School of Electronic Engineering, Xidian University, Xi’an 710071, China
Y
Yan Wu
Remote Sensing Image Processing and Fusion Group, School of Electronic Engineering, Xidian University, Xi’an 710071, China
Xin Hu
Xin Hu
Emory University School of Medicine, Department of Radiation Oncology
Health Services ResearchHealth EconomicsHealth PolicyCancer Outcomes
Yi Dai
Yi Dai
Ph.D. Candidate, University of Michigan
process controlmodel predictive control
M
Ming Li
National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China