Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe background interference and the difficulty of ensuring cross-modal consistency in multi-modal person re-identification (ReID), this paper proposes a Selective Interaction and Global-Local Alignment framework. Methodologically, it introduces a Selective Interaction Module (SIM) that filters salient image patch tokens and enhances class-token interactions to suppress background noise and improve feature discriminability. Additionally, it designs a Global Alignment Module (GAM) and a Local Alignment Module (LAM) within a 3D Gramian space, jointly leveraging self-attention mechanisms and geometric constraints to achieve cross-modal global semantic consistency and local structural alignment. Extensive experiments demonstrate significant improvements over state-of-the-art methods on three challenging benchmarks—RGBNT201, RGBNT100, and MSVR310—validating both the effectiveness and generalizability of the proposed approach.

Technology Category

Application Category

📝 Abstract
Multi-modal object Re-IDentification (ReID) is devoted to retrieving specific objects through the exploitation of complementary multi-modal image information. Existing methods mainly concentrate on the fusion of multi-modal features, yet neglecting the background interference. Besides, current multi-modal fusion methods often focus on aligning modality pairs but suffer from multi-modal consistency alignment. To address these issues, we propose a novel selective interaction and global-local alignment framework called Signal for multi-modal object ReID. Specifically, we first propose a Selective Interaction Module (SIM) to select important patch tokens with intra-modal and inter-modal information. These important patch tokens engage in the interaction with class tokens, thereby yielding more discriminative features. Then, we propose a Global Alignment Module (GAM) to simultaneously align multi-modal features by minimizing the volume of 3D polyhedra in the gramian space. Meanwhile, we propose a Local Alignment Module (LAM) to align local features in a shift-aware manner. With these modules, our proposed framework could extract more discriminative features for object ReID. Extensive experiments on three multi-modal object ReID benchmarks (i.e., RGBNT201, RGBNT100, MSVR310) validate the effectiveness of our method. The source code is available at https://github.com/010129/Signal.
Problem

Research questions and friction points this paper is trying to address.

Addresses background interference in multi-modal object re-identification systems
Solves multi-modal consistency alignment issues in feature fusion methods
Enhances discriminative feature extraction through selective interaction mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective interaction module filters important patch tokens
Global alignment minimizes 3D polyhedra volume in gramian space
Local alignment performs shift-aware local feature matching
🔎 Similar Papers
No similar papers found.
Yangyang Liu
Yangyang Liu
casia
OCRDeep Learning
Y
Yuhao Wang
School of Future Technology, Dalian University of Technology
P
Pingping Zhang
School of Future Technology, Dalian University of Technology