🤖 AI Summary
This study presents the first systematic evaluation of open-source multimodal large language models (MLLMs) on zero-shot face recognition. Addressing the gap in understanding MLLMs’ capability for identity-level visual reasoning, we conduct reproducible benchmarking across standard datasets—including LFW, CFP, and AgeDB—using a unified image encoding and text-prompting framework. Results demonstrate that while MLLMs capture high-level facial semantics, their recognition accuracy lags significantly behind task-specific models, particularly in fine-grained identity discrimination and cross-pose or cross-age scenarios. Our contributions are threefold: (1) the first standardized benchmarking protocol for face recognition tailored to open-source MLLMs; (2) a lightweight, modular evaluation framework, publicly released; and (3) an empirical characterization of MLLMs’ capabilities and limitations in identity discrimination, identifying concrete bottlenecks and actionable directions for improving visual-linguistic models toward high-precision perception tasks.
📝 Abstract
Multimodal large language models (MLLMs) have achieved remarkable performance across diverse vision-and-language tasks. However, their potential in face recognition remains underexplored. In particular, the performance of open-source MLLMs needs to be evaluated and compared with existing face recognition models on standard benchmarks with similar protocol. In this work, we present a systematic benchmark of state-of-the-art MLLMs for face recognition on several face recognition datasets, including LFW, CALFW, CPLFW, CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications. This benchmark provides a foundation for advancing MLLM-based face recognition, offering insights for the design of next-generation models with higher accuracy and generalization. The source code of our benchmark is publicly available in the project page.