🤖 AI Summary
Current LLM inference is vulnerable to multiple co-occurring biases, yet mainstream benchmarks evaluate only single-bias scenarios, lacking capabilities for multi-bias collaborative assessment and mitigation. To address this gap, we introduce MultiBiasBench—the first benchmark featuring five concurrently present bias types—and systematically reveal the significant failure of state-of-the-art LMs and debiasing methods under joint multi-bias intervention. Building on these findings, we propose Causal-effect-guided Multi-Bias Elimination (CMBE), a novel debiasing framework that enables semantically grounded, disentangled intervention against multiple biases via inverse probability weighting (IPW) and double machine learning (DML)-based causal inference, counterfactual reasoning, and bias-sensitive attention masking. CMBE further adopts a multi-task causal disentanglement training paradigm. Experiments show that CMBE achieves an average 32.7% improvement in debiasing performance over SOTA methods on MultiBiasBench while preserving 98.4% of original task accuracy.
📝 Abstract
Despite significant progress, recent studies have indicated that current large language models (LLMs) may still utilize bias during inference, leading to the poor generalizability of LLMs. Some benchmarks are proposed to investigate the generalizability of LLMs, with each piece of data typically containing one type of controlled bias. However, a single piece of data may contain multiple types of biases in practical applications. To bridge this gap, we propose a multi-bias benchmark where each piece of data contains five types of biases. The evaluations conducted on this benchmark reveal that the performance of existing LLMs and debiasing methods is unsatisfying, highlighting the challenge of eliminating multiple types of biases simultaneously. To overcome this challenge, we propose a causal effect estimation-guided multi-bias elimination method (CMBE). This method first estimates the causal effect of multiple types of biases simultaneously. Subsequently, we eliminate the causal effect of biases from the total causal effect exerted by both the semantic information and biases during inference. Experimental results show that CMBE can effectively eliminate multiple types of bias simultaneously to enhance the generalizability of LLMs.