🤖 AI Summary
This work addresses the modality gap in visible-infrared person re-identification (VI-ReID), which arises from discrepancies in illumination and wavelength characteristics between modalities. To bridge this gap, the authors propose a Multi-band Frequency Expert Network (MFEN), introducing for the first time a multi-band adaptive fusion mechanism into VI-ReID. The method employs a mixture-of-experts architecture to dynamically integrate information from different frequency bands and incorporates Random Frequency Augmentation (RFA) and Frequency-Assisted Optimization (FAO) strategies to facilitate robust cross-modal feature learning. Extensive experiments on three mainstream VI-ReID benchmarks demonstrate that the proposed approach significantly outperforms existing state-of-the-art methods, confirming the effectiveness of multi-band collaborative learning in enhancing cross-modal matching performance.
📝 Abstract
Visible-infrared person re-identification (VI-ReID) is challenging due to the large modality discrepancy between visible and infrared images. We contend that this discrepancy is largely related to differing lighting conditions, including differences in light wavelength and light source type. Recently, frequency-based VI-ReID approaches have achieved notable success because frequency information can better extract identity-relevant contours and details while excluding irrelevant lighting and color. However, existing methods either do not distinguish different frequency bands or focus on only one band, which is insufficient under diverse lighting conditions. To perform comprehensive frequency domain learning, we propose a Multi-Frequency Expert Network (MFEN) that enables multi-frequency modulation and adaptively combines different bands through a mixture-of-experts design. We further introduce Random Frequency Augmentation (RFA) and Frequency Auxiliary Optimization (FAO) to better train MFEN. The three modules are complementary and jointly capture critical frequency-domain details for robust representation learning. Extensive experiments on three VI-ReID datasets demonstrate the effectiveness of our approach.