From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing visible-infrared person re-identification (VI-ReID) methods struggle in real-world surveillance scenarios where gallery sets contain mixed visible and infrared images, suffering from both cross-modal matching failure and insufficient intra-modal discriminability. To address this, we propose a novel *hybrid-modal ReID* paradigm—departing from conventional unimodal or purely cross-modal settings—and introduce MixER, a unified framework featuring orthogonal feature decomposition to disentangle modality-specific and modality-invariant identity representations, modality-confusion training to enhance robustness against modality shifts, and explicit identity-modality association modeling to suppress modality interference. All components are integrated into a single backbone network and optimized end-to-end. Extensive experiments on SYSU-MM01, RegDB, and LLMC demonstrate state-of-the-art performance, with significant improvements in both cross-modal and hybrid-modal retrieval accuracy, validating the method’s strong generalization capability and practical applicability.

Technology Category

Application Category

📝 Abstract
Visible-infrared person re-identification (VI-ReID) aims to match individuals across different camera modalities, a critical task in modern surveillance systems. While current VI-ReID methods focus on cross-modality matching, real-world applications often involve mixed galleries containing both V and I images, where state-of-the-art methods show significant performance limitations due to large domain shifts and low discrimination across mixed modalities. This is because gallery images from the same modality may have lower domain gaps but correspond to different identities. This paper introduces a novel mixed-modal ReID setting, where galleries contain data from both modalities. To address the domain shift among inter-modal and low discrimination capacity in intra-modal matching, we propose the Mixed Modality-Erased and -Related (MixER) method. The MixER learning approach disentangles modality-specific and modality-shared identity information through orthogonal decomposition, modality-confusion, and ID-modality-related objectives. MixER enhances feature robustness across modalities, improving cross-modal and mixed-modal settings performance. Our extensive experiments on the SYSU-MM01, RegDB and LLMC datasets indicate that our approach can provide state-of-the-art results using a single backbone, and showcase the flexibility of our approach in mixed gallery applications.
Problem

Research questions and friction points this paper is trying to address.

Pedestrian Re-Identification
Visible and Infrared Images
Accuracy Issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

MixER method
Visible-Infrared ReID
Mixed Image Repository