Neutralizing Bias in LLM Reasoning using Entailment Graphs

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Large language models (LLMs) exhibit confirmation bias in natural language inference (NLI), over-relying on propositional memory and thereby generating frequent hallucinations. To address this, we propose an unsupervised counterfactual reasoning framework that constructs a bias-adversarial NLI evaluation paradigm via predicate randomization—enabling, for the first time, fully automated counterfactual data generation and bias-neutral fine-tuning without human annotation. Our method integrates entailment graph modeling, bias-aware fine-tuning, and bias-adversarial data design to systematically attenuate model dependence on superficial lexical patterns. Experiments demonstrate that our approach significantly reduces NLI hallucination rates while consistently improving both inference accuracy and robustness on both original and bias-neutralized test sets. This work establishes a scalable, low-resource paradigm for mitigating reasoning biases in LLMs.

Technology Category

Application Category

📝 Abstract

LLMs are often claimed to be capable of Natural Language Inference (NLI), which is widely regarded as a cornerstone of more complex forms of reasoning. However, recent works show that LLMs still suffer from hallucinations in NLI due to attestation bias, where LLMs overly rely on propositional memory to build shortcuts. To solve the issue, we design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. To measure bias reduction, we build bias-adversarial variants of NLI datasets with randomly replaced predicates in premises while keeping hypotheses unchanged. Extensive evaluations show that our framework can significantly reduce hallucinations from attestation bias. Then, we further evaluate LLMs fine-tuned with our framework on original NLI datasets and their bias-neutralized versions, where original entities are replaced with randomly sampled ones. Extensive results show that our framework consistently improves inferential performance on both original and bias-neutralized NLI datasets.

Problem

Research questions and friction points this paper is trying to address.

Reducing hallucinations in LLMs due to attestation bias

Constructing counterfactual reasoning data to fine-tune LLMs

Improving inferential performance on bias-neutralized NLI datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised framework for counterfactual reasoning data

Fine-tuning LLMs to reduce attestation bias

Bias-adversarial NLI datasets with random predicates

🔎 Similar Papers

Red-Teaming for Inducing Societal Bias in Large Language Models