Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

📅 2024-06-25

🏛️ arXiv.org

📈 Citations: 14

✨ Influential: 0

🤖 AI Summary

This study systematically evaluates visual fairness disparities in mainstream large vision-language models (LVLMs) across demographic attributes—including gender, skin tone, age, and race. Method: Leveraging benchmarks such as FACET and UTKFace, we conduct multimodal fairness auditing via zero-shot direct question-answering and multiple-choice prompting. We further propose a novel chain-of-thought (CoT)-driven general fairness mitigation strategy, designed to be model-agnostic—applicable to both open- and closed-source LVLMs—while enhancing decision transparency and cross-architectural scalability. Contribution/Results: Empirical analysis reveals significant cross-group performance gaps across all evaluated LVLMs. Our CoT-based approach achieves an average 12.7% improvement across multiple fairness metrics, substantially mitigating bias. The work establishes a reproducible, transferable methodological framework for fairness governance in multimodal AI systems.

Technology Category

Application Category

📝 Abstract

Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, age and race. In this paper, We empirically investigate visual fairness in several mainstream LVLMs by auditing their performance disparities across demographic attributes using public fairness benchmark datasets (e.g., FACET, UTKFace). Our fairness evaluation framework employs direct and single-choice question prompt on visual question-answering/classification tasks. Despite advancements in visual understanding, our zero-shot prompting results show that both open-source and closed-source LVLMs continue to exhibit fairness issues across different prompts and demographic groups. Furthermore, we propose a potential multi-modal Chain-of-thought (CoT) based strategy for bias mitigation, applicable to both open-source and closed-source LVLMs. This approach enhances transparency and offers a scalable solution for addressing fairness, providing a solid foundation for future bias reduction efforts.

Problem

Research questions and friction points this paper is trying to address.

Evaluating fairness disparities in large vision-language models across demographic attributes

Investigating performance biases related to gender, skin tone, age and race

Developing mitigation strategies for unfairness in multimodal AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates fairness using benchmark datasets and prompts

Proposes multi-modal Chain-of-Thought for bias mitigation

Provides scalable solution applicable to various LVLMs

🔎 Similar Papers

No similar papers found.

Authors to Follow