COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models

📅 2024-02-22
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM bias evaluation benchmarks neglect contextual influence, leading to inaccurate assessments. Method: We propose a Contextual Reliability Evaluation Framework that formally defines and quantifies the “contextual reliability” of biased statements, introducing the COBIAS metric. Our approach models contextual robustness via behavioral variance, validates results through human annotation and Spearman correlation analysis (ρ = 0.65, p < 1e⁻⁵⁹), and enhances generalizability via cross-context data augmentation (2,291 statements). Contribution/Results: COBIAS achieves strong alignment with human judgments and empirically demonstrates that context critically modulates bias detection performance—challenging the static-assumption foundation of current benchmarks. The framework identifies high-reliability instances, enabling construction of more robust, context-sensitive bias evaluation benchmarks.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices. Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets. These benchmarks measure bias by observing an LLM's behavior on biased statements. However, these statements lack contextual considerations of the situations they try to present. To address this, we introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear. We develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to measure a biased statement's reliability in detecting bias, based on the variance in model behavior across different contexts. To evaluate the metric, we augmented 2,291 stereotyped statements from two existing benchmark datasets by adding contextual information. We show that COBIAS aligns with human judgment on the contextual reliability of biased statements (Spearman's $ ho = 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable benchmarks, which would assist bias mitigation works.
Problem

Research questions and friction points this paper is trying to address.

Evaluating bias in LLMs lacks contextual reliability in current benchmarks
Proposing COBIAS to measure bias statement reliability across contexts
Augmenting existing bias benchmarks with contextual information for robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual reliability framework evaluates bias robustness
COBIAS measures bias reliability across different contexts
Augmented benchmark datasets with contextual information
🔎 Similar Papers
No similar papers found.