🤖 AI Summary
Large vision-language models (LVLMs) exhibit systemic social biases by implicitly associating neutral multimodal tasks with sensitive human attributes, yet the underlying mechanisms remain unclear.
Method: This work introduces the first integrated framework combining information-flow attribution analysis with multi-turn controlled dialogue evaluation to trace bias origins. It further quantifies cross-modal semantic alignment in CLIP embedding space to measure bias-induced semantic clustering of sensitive attributes in text representations triggered by visual inputs.
Contribution/Results: We identify that bias stems from implicit encoding of sensitive attributes in image tokens and their dynamic propagation across reasoning steps—revealing intrinsic modality imbalance and bias emergence within multi-step inference. Our analysis demonstrates that visual inputs systematically distort textual semantic distributions toward biased attribute associations. This study establishes the first interpretable, mechanism-aware framework for bias溯源 that jointly accounts for internal model dynamics and cross-modal consistency.
📝 Abstract
Large Vision Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet they also exhibit notable social biases. These biases often manifest as unintended associations between neutral concepts and sensitive human attributes, leading to disparate model behaviors across demographic groups. While existing studies primarily focus on detecting and quantifying such biases, they offer limited insight into the underlying mechanisms within the models. To address this gap, we propose an explanatory framework that combines information flow analysis with multi-round dialogue evaluation, aiming to understand the origin of social bias from the perspective of imbalanced internal information utilization. Specifically, we first identify high-contribution image tokens involved in the model's reasoning process for neutral questions via information flow analysis. Then, we design a multi-turn dialogue mechanism to evaluate the extent to which these key tokens encode sensitive information. Extensive experiments reveal that LVLMs exhibit systematic disparities in information usage when processing images of different demographic groups, suggesting that social bias is deeply rooted in the model's internal reasoning dynamics. Furthermore, we complement our findings from a textual modality perspective, showing that the model's semantic representations already display biased proximity patterns, thereby offering a cross-modal explanation of bias formation.