COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models

📅 2024-02-22

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

Existing LLM bias evaluation benchmarks neglect contextual influence, leading to inaccurate assessments. Method: We propose a Contextual Reliability Evaluation Framework that formally defines and quantifies the “contextual reliability” of biased statements, introducing the COBIAS metric. Our approach models contextual robustness via behavioral variance, validates results through human annotation and Spearman correlation analysis (ρ = 0.65, p < 1e⁻⁵⁹), and enhances generalizability via cross-context data augmentation (2,291 statements). Contribution/Results: COBIAS achieves strong alignment with human judgments and empirically demonstrates that context critically modulates bias detection performance—challenging the static-assumption foundation of current benchmarks. The framework identifies high-reliability instances, enabling construction of more robust, context-sensitive bias evaluation benchmarks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices. Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets. These benchmarks measure bias by observing an LLM's behavior on biased statements. However, these statements lack contextual considerations of the situations they try to present. To address this, we introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear. We develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to measure a biased statement's reliability in detecting bias, based on the variance in model behavior across different contexts. To evaluate the metric, we augmented 2,291 stereotyped statements from two existing benchmark datasets by adding contextual information. We show that COBIAS aligns with human judgment on the contextual reliability of biased statements (Spearman's $ ho = 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable benchmarks, which would assist bias mitigation works.

Problem

Research questions and friction points this paper is trying to address.

Evaluating bias in LLMs lacks contextual reliability in current benchmarks

Proposing COBIAS to measure bias statement reliability across contexts

Augmenting existing bias benchmarks with contextual information for robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual reliability framework evaluates bias robustness

COBIAS measures bias reliability across different contexts

Augmented benchmark datasets with contextual information

🔎 Similar Papers

No similar papers found.

Authors to Follow