Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

📅 2024-05-30

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 1

🤖 AI Summary

This work identifies social bias in large vision-language models (LVLMs) triggered by image-level demographic attributes—such as race and gender—that systematically distort generated text. We propose the first image-level counterfactual intervention framework, generating over 57 million model responses by editing social attributes in images. Leveraging these responses, we construct a multidimensional bias evaluation suite measuring toxicity, ability-related lexical distribution, stereotype semantic similarity, and numerical rating deviation—enabling disentanglement of vision- versus language-modality contributions to bias. Experiments reveal that mainstream LVLMs exhibit up to a 3.8× increase in toxic generation rates for specific demographic images, >40% skew in ability-related word distributions, and numerical rating biases of up to 1.2 points (on a 5-point scale). This is the first large-scale empirical measurement and systematic attribution of vision-induced bias in LVLMs, establishing a novel methodology and benchmark for fairness evaluation.

Technology Category

Application Category

📝 Abstract

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images, producing over 57 million responses from popular models. Our multi-dimensional bias evaluation framework reveals that social attributes such as perceived race, gender, and physical characteristics depicted in images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of individuals.

Problem

Research questions and friction points this paper is trying to address.

Detect social biases in Large Vision-Language Models (LVLMs)

Measure bias impact on toxic content and stereotypes

Evaluate bias across race, gender, and physical traits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses counterfactual changes to input images

Large-scale study with 57 million responses

Multi-dimensional bias evaluation framework

🔎 Similar Papers

No similar papers found.

Authors to Follow