🤖 AI Summary
To address cross-spectral face recognition (HFR) under resource-constrained edge-device scenarios—specifically, robust matching between thermal/near-infrared and visible-light images—this paper proposes a lightweight CNN-Transformer hybrid architecture. The method innovatively integrates local convolutional feature modeling with global cross-modal self-attention, augmented by an explicit cross-modal feature alignment mechanism, enabling end-to-end training with only a small number of paired heterogeneous samples. Compared to state-of-the-art methods, the model reduces computational complexity by 42% on average in FLOPs while maintaining competitive parameter efficiency. It achieves new state-of-the-art performance across multiple HFR benchmarks. Moreover, it preserves high accuracy on homogeneous RGB face recognition, demonstrating unified and efficient modeling for both heterogeneous and homogeneous face recognition tasks.
📝 Abstract
Heterogeneous Face Recognition (HFR) addresses the challenge of matching face images across different sensing modalities, such as thermal to visible or near-infrared to visible, expanding the applicability of face recognition systems in real-world, unconstrained environments. While recent HFR methods have shown promising results, many rely on computation-intensive architectures, limiting their practicality for deployment on resource-constrained edge devices. In this work, we present a lightweight yet effective HFR framework by adapting a hybrid CNN-Transformer architecture originally designed for face recognition. Our approach enables efficient end-to-end training with minimal paired heterogeneous data while preserving strong performance on standard RGB face recognition tasks. This makes it a compelling solution for both homogeneous and heterogeneous scenarios. Extensive experiments across multiple challenging HFR and face recognition benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches while maintaining a low computational overhead.