Context-aware Fairness Evaluation and Mitigation in LLMs

๐Ÿ“… 2025-10-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) tend to amplify unfairness, harmful content, and inconsistency in multi-turn dialogues; existing training-time interventions or static pruning methods incur high computational overhead, are irreversible, and lack contextual adaptability. This paper proposes Dynamic Reversible Pruning (DRP), a context-aware inference-time framework that detects bias-correlated neuron activations via contextualized activation profiling and suppresses them through adaptive maskingโ€”without altering model parameters. Its key contributions are: (i) the first fine-grained, reversible, and context-driven fairness control mechanism for LLMs; (ii) native support for both monolingual and multilingual, single- and multi-turn dialogue settings; and (iii) significant reduction in unfair behavior incidence while preserving response coherence and factual integrity, as empirically validated across diverse benchmarks.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although training-time or data-centric methods attempt to reduce these effects, they are computationally expensive, irreversible once deployed, and slow to adapt to new conversational contexts. Pruning-based methods provide a flexible and transparent way to reduce bias by adjusting the neurons responsible for certain behaviors. However, most existing approaches are static; once a neuron is removed, the model loses the ability to adapt when the conversation or context changes. To address this, we propose a dynamic, reversible, pruning-based framework that detects context-aware neuron activations and applies adaptive masking to modulate their influence during generation. Our inference-time solution provides fine-grained, memory-aware mitigation with knowledge-preserved, more coherent behavior across multilingual single- and multi-turn dialogues, enabling dynamic fairness control in real-world conversational AI.
Problem

Research questions and friction points this paper is trying to address.

Evaluating and mitigating fairness issues in large language models
Addressing static bias mitigation limitations through dynamic neuron pruning
Enabling context-aware fairness control during multilingual conversational generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic reversible pruning framework for bias mitigation
Adaptive neuron masking during inference time
Context-aware fairness control in multilingual dialogues
๐Ÿ”Ž Similar Papers
No similar papers found.