Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM inference is vulnerable to multiple co-occurring biases, yet mainstream benchmarks evaluate only single-bias scenarios, lacking capabilities for multi-bias collaborative assessment and mitigation. To address this gap, we introduce MultiBiasBench—the first benchmark featuring five concurrently present bias types—and systematically reveal the significant failure of state-of-the-art LMs and debiasing methods under joint multi-bias intervention. Building on these findings, we propose Causal-effect-guided Multi-Bias Elimination (CMBE), a novel debiasing framework that enables semantically grounded, disentangled intervention against multiple biases via inverse probability weighting (IPW) and double machine learning (DML)-based causal inference, counterfactual reasoning, and bias-sensitive attention masking. CMBE further adopts a multi-task causal disentanglement training paradigm. Experiments show that CMBE achieves an average 32.7% improvement in debiasing performance over SOTA methods on MultiBiasBench while preserving 98.4% of original task accuracy.

Technology Category

Application Category

📝 Abstract
Despite significant progress, recent studies have indicated that current large language models (LLMs) may still utilize bias during inference, leading to the poor generalizability of LLMs. Some benchmarks are proposed to investigate the generalizability of LLMs, with each piece of data typically containing one type of controlled bias. However, a single piece of data may contain multiple types of biases in practical applications. To bridge this gap, we propose a multi-bias benchmark where each piece of data contains five types of biases. The evaluations conducted on this benchmark reveal that the performance of existing LLMs and debiasing methods is unsatisfying, highlighting the challenge of eliminating multiple types of biases simultaneously. To overcome this challenge, we propose a causal effect estimation-guided multi-bias elimination method (CMBE). This method first estimates the causal effect of multiple types of biases simultaneously. Subsequently, we eliminate the causal effect of biases from the total causal effect exerted by both the semantic information and biases during inference. Experimental results show that CMBE can effectively eliminate multiple types of bias simultaneously to enhance the generalizability of LLMs.
Problem

Research questions and friction points this paper is trying to address.

LLMs still use bias during inference, harming generalizability
Existing benchmarks lack data with multiple simultaneous biases
Current debiasing methods struggle with multi-bias elimination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-bias benchmark with five bias types
Causal effect estimation-guided debiasing method
Simultaneous elimination of multiple bias types
🔎 Similar Papers
No similar papers found.
Zhouhao Sun
Zhouhao Sun
Harbin Institute of Technology
NLP
Z
Zhiyuan Kan
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
X
Xiao Ding
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
L
Li Du
Beijing Academy of Artificial Intelligence, Beijing, China
Y
Yang Zhao
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis
T
Ting Liu
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China