Interpreting Social Bias in LVLMs via Information Flow Analysis and Multi-Round Dialogue Evaluation

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large vision-language models (LVLMs) exhibit systemic social biases by implicitly associating neutral multimodal tasks with sensitive human attributes, yet the underlying mechanisms remain unclear. Method: This work introduces the first integrated framework combining information-flow attribution analysis with multi-turn controlled dialogue evaluation to trace bias origins. It further quantifies cross-modal semantic alignment in CLIP embedding space to measure bias-induced semantic clustering of sensitive attributes in text representations triggered by visual inputs. Contribution/Results: We identify that bias stems from implicit encoding of sensitive attributes in image tokens and their dynamic propagation across reasoning steps—revealing intrinsic modality imbalance and bias emergence within multi-step inference. Our analysis demonstrates that visual inputs systematically distort textual semantic distributions toward biased attribute associations. This study establishes the first interpretable, mechanism-aware framework for bias溯源 that jointly accounts for internal model dynamics and cross-modal consistency.

Technology Category

Application Category

📝 Abstract
Large Vision Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet they also exhibit notable social biases. These biases often manifest as unintended associations between neutral concepts and sensitive human attributes, leading to disparate model behaviors across demographic groups. While existing studies primarily focus on detecting and quantifying such biases, they offer limited insight into the underlying mechanisms within the models. To address this gap, we propose an explanatory framework that combines information flow analysis with multi-round dialogue evaluation, aiming to understand the origin of social bias from the perspective of imbalanced internal information utilization. Specifically, we first identify high-contribution image tokens involved in the model's reasoning process for neutral questions via information flow analysis. Then, we design a multi-turn dialogue mechanism to evaluate the extent to which these key tokens encode sensitive information. Extensive experiments reveal that LVLMs exhibit systematic disparities in information usage when processing images of different demographic groups, suggesting that social bias is deeply rooted in the model's internal reasoning dynamics. Furthermore, we complement our findings from a textual modality perspective, showing that the model's semantic representations already display biased proximity patterns, thereby offering a cross-modal explanation of bias formation.
Problem

Research questions and friction points this paper is trying to address.

Analyzing social bias origins in LVLMs via information flow
Evaluating biased internal reasoning dynamics across demographics
Exploring cross-modal bias formation in vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Information flow analysis identifies key image tokens
Multi-round dialogue evaluates sensitive information encoding
Cross-modal analysis reveals biased semantic representations
🔎 Similar Papers
No similar papers found.
Zhengyang Ji
Zhengyang Ji
Shandong University
AI
Y
Yifan Jia
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China; Shandong University, Qingdao, China
S
Shang Gao
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Y
Yutao Yue
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China; Institute of Deep Perception Technology, JITRI, Wuxi, China