Fair In-Context Learning via Latent Concept Variables

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

179K/year

🤖 AI Summary

This work addresses implicit social biases in large language models (LLMs) during in-context learning on tabular data. We propose the first fairness-aware demonstration selection framework grounded in latent concept variables. Methodologically, we innovatively employ a lightweight internal LLM to learn disentangled latent representations of sensitive attributes (e.g., gender, race), enabling sensitive-attribute decorrelation via data augmentation and hierarchical demonstration selection—thereby facilitating collaborative reasoning between internal and external LLMs. The framework jointly optimizes predictive utility and fairness: across multiple tabular benchmarks, it significantly improves fairness metrics—reducing Equalized Odds difference by 37% on average—while maintaining accuracy comparable to strong baselines. Our core contributions are threefold: (1) the first integration of latent concept disentanglement into LLM-based tabular in-context learning; (2) a resource-efficient, transferable fairness modeling approach; and (3) empirical validation of its effectiveness in mitigating bias without sacrificing performance.

Technology Category

Application Category

📝 Abstract

The emerging in-context learning (ICL) ability of large language models (LLMs) has prompted their use for predictive tasks in various domains with different types of data facilitated by serialization methods. However, with increasing applications in high-stakes domains, it has been shown that LLMs can inherit social bias and discrimination from their pre-training data. In this work, we investigate this inherent bias in LLMs during in-context learning with tabular data. We focus on an optimal demonstration selection approach that utilizes latent concept variables for resource-efficient task adaptation. We design data augmentation strategies that reduce correlation between predictive outcomes and sensitive variables helping to promote fairness during latent concept learning. We utilize the learned concept and select demonstrations from a training dataset to obtain fair predictions during inference while maintaining model utility. The latent concept variable is learned using a smaller internal LLM and the selected demonstrations can be used for inference with larger external LLMs. We empirically verify that the fair latent variable approach improves fairness results on tabular datasets compared to multiple heuristic demonstration selection methods.

Problem

Research questions and friction points this paper is trying to address.

Investigating inherent bias in LLMs during in-context learning with tabular data

Reducing correlation between predictive outcomes and sensitive variables for fairness

Selecting optimal demonstrations using latent concepts to obtain fair predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizing latent concept variables for demonstration selection

Employing data augmentation to reduce sensitive variable correlation

Learning concepts with smaller LLM and generalizing to larger LLMs

🔎 Similar Papers

No similar papers found.