Efficient Fairness Testing in Large Language Models: Prioritizing Metamorphic Relations for Bias Detection

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low detection efficiency and high annotation cost arising from exponential growth in metamorphic relations (MRs) during fairness testing of large language models (LLMs), this paper proposes an MR prioritization method based on sentence-embedding diversity scoring. It is the first to incorporate sentence-level diversity into MR selection, enabling bias-sensitive test case focusing without requiring manual fault annotations. Experimental results show that the method improves fault detection rate by 22% over random baseline and by 12% over distance-based prioritization; reduces time-to-first-failure by 15% and 8%, respectively; and achieves 95% of the detection efficacy of exhaustive fault-annotated testing. The approach balances high detection effectiveness with low computational overhead, offering a scalable, low-cost paradigm for automated fairness testing of LLMs.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly deployed in various applications, raising critical concerns about fairness and potential biases in their outputs. This paper explores the prioritization of metamorphic relations (MRs) in metamorphic testing as a strategy to efficiently detect fairness issues within LLMs. Given the exponential growth of possible test cases, exhaustive testing is impractical; therefore, prioritizing MRs based on their effectiveness in detecting fairness violations is crucial. We apply a sentence diversity-based approach to compute and rank MRs to optimize fault detection. Experimental results demonstrate that our proposed prioritization approach improves fault detection rates by 22% compared to random prioritization and 12% compared to distance-based prioritization, while reducing the time to the first failure by 15% and 8%, respectively. Furthermore, our approach performs within 5% of fault-based prioritization in effectiveness, while significantly reducing the computational cost associated with fault labeling. These results validate the effectiveness of diversity-based MR prioritization in enhancing fairness testing for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Prioritizing metamorphic relations to detect fairness issues in LLMs
Improving efficiency in bias detection with sentence diversity ranking
Reducing computational cost while maintaining high fault detection rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prioritizes metamorphic relations for bias detection
Uses sentence diversity to rank test cases
Reduces computational cost without sacrificing effectiveness
🔎 Similar Papers
No similar papers found.
S
Suavis Giramata
Computer Science Department, East Carolina University, Greenville, USA
Madhusudan Srinivasan
Madhusudan Srinivasan
East Carolina University
Software Testing and Verification
V
Venkat Naidu Gudivada
Computer Science Department, East Carolina University, Greenville, USA
Upulee Kanewala
Upulee Kanewala
Associate Professor, University of North Florida
Software Testing and Software Engineering