Benchmarking Multimodal Large Language Models for Face Recognition

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This study presents the first systematic evaluation of open-source multimodal large language models (MLLMs) on zero-shot face recognition. Addressing the gap in understanding MLLMs’ capability for identity-level visual reasoning, we conduct reproducible benchmarking across standard datasets—including LFW, CFP, and AgeDB—using a unified image encoding and text-prompting framework. Results demonstrate that while MLLMs capture high-level facial semantics, their recognition accuracy lags significantly behind task-specific models, particularly in fine-grained identity discrimination and cross-pose or cross-age scenarios. Our contributions are threefold: (1) the first standardized benchmarking protocol for face recognition tailored to open-source MLLMs; (2) a lightweight, modular evaluation framework, publicly released; and (3) an empirical characterization of MLLMs’ capabilities and limitations in identity discrimination, identifying concrete bottlenecks and actionable directions for improving visual-linguistic models toward high-precision perception tasks.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) have achieved remarkable performance across diverse vision-and-language tasks. However, their potential in face recognition remains underexplored. In particular, the performance of open-source MLLMs needs to be evaluated and compared with existing face recognition models on standard benchmarks with similar protocol. In this work, we present a systematic benchmark of state-of-the-art MLLMs for face recognition on several face recognition datasets, including LFW, CALFW, CPLFW, CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications. This benchmark provides a foundation for advancing MLLM-based face recognition, offering insights for the design of next-generation models with higher accuracy and generalization. The source code of our benchmark is publicly available in the project page.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' face recognition performance on standard benchmarks

Comparing open-source MLLMs with specialized face recognition models

Identifying MLLMs' limitations in high-precision zero-shot recognition scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking MLLMs for face recognition tasks

Evaluating MLLMs against specialized face recognition models

Providing foundation for advancing MLLM-based face recognition

🔎 Similar Papers

No similar papers found.