Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models

📅 2024-07-27

📈 Citations: 1

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of simultaneously achieving high predictive accuracy and fairness in text classification models without retraining. We propose a dynamic, inference-time selective debiasing method. Our approach comprises three key components: (1) the first adaptation of selective classification to fairness-aware prediction; (2) a KL-divergence-based bias-aware mechanism that more precisely identifies biased instances than conventional uncertainty estimation; and (3) lightweight post-hoc debiasing via Linear Encoder-Decoder Adversarial Correction (LEACE). Crucially, our method requires no architectural modifications or changes to the training pipeline. Evaluated on multiple benchmarks, it improves fairness metrics by 12–28% while incurring only marginal accuracy degradation. As a result, it significantly narrows the performance gap between post-processing debiasing methods and more costly training- or pre-processing-based approaches.

Technology Category

Application Category

📝 Abstract

We propose selective debiasing -- an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE -- a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.

Problem

Research questions and friction points this paper is trying to address.

Enhance fairness in text classification

Selective debiasing at inference time

Reduce bias using KL divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-time selective debiasing

LEACE post-processing debiasing

KL divergence bias quantification

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings