Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Current LLM inference is vulnerable to multiple co-occurring biases, yet mainstream benchmarks evaluate only single-bias scenarios, lacking capabilities for multi-bias collaborative assessment and mitigation. To address this gap, we introduce MultiBiasBench—the first benchmark featuring five concurrently present bias types—and systematically reveal the significant failure of state-of-the-art LMs and debiasing methods under joint multi-bias intervention. Building on these findings, we propose Causal-effect-guided Multi-Bias Elimination (CMBE), a novel debiasing framework that enables semantically grounded, disentangled intervention against multiple biases via inverse probability weighting (IPW) and double machine learning (DML)-based causal inference, counterfactual reasoning, and bias-sensitive attention masking. CMBE further adopts a multi-task causal disentanglement training paradigm. Experiments show that CMBE achieves an average 32.7% improvement in debiasing performance over SOTA methods on MultiBiasBench while preserving 98.4% of original task accuracy.

Technology Category

Application Category

📝 Abstract

Despite significant progress, recent studies have indicated that current large language models (LLMs) may still utilize bias during inference, leading to the poor generalizability of LLMs. Some benchmarks are proposed to investigate the generalizability of LLMs, with each piece of data typically containing one type of controlled bias. However, a single piece of data may contain multiple types of biases in practical applications. To bridge this gap, we propose a multi-bias benchmark where each piece of data contains five types of biases. The evaluations conducted on this benchmark reveal that the performance of existing LLMs and debiasing methods is unsatisfying, highlighting the challenge of eliminating multiple types of biases simultaneously. To overcome this challenge, we propose a causal effect estimation-guided multi-bias elimination method (CMBE). This method first estimates the causal effect of multiple types of biases simultaneously. Subsequently, we eliminate the causal effect of biases from the total causal effect exerted by both the semantic information and biases during inference. Experimental results show that CMBE can effectively eliminate multiple types of bias simultaneously to enhance the generalizability of LLMs.

Problem

Research questions and friction points this paper is trying to address.

LLMs still use bias during inference, harming generalizability

Existing benchmarks lack data with multiple simultaneous biases

Current debiasing methods struggle with multi-bias elimination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-bias benchmark with five bias types

Causal effect estimation-guided debiasing method

Simultaneous elimination of multiple bias types

🔎 Similar Papers

Prompting Fairness: Integrating Causality to Debias Large Language Models