Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies critical gaps in the applicability of existing representational harm measurement tools for practical evaluation of LLM systems. Method: Through semi-structured interviews with 12 practitioners and thematic coding, we systematically identify two core challenges—“requirement misalignment” (e.g., theoretical assumptions diverging from real-world deployment contexts) and “adoption barriers” (e.g., poor integrability, low interpretability)—and distill six key practitioner needs and four institutional constraints. Grounded in measurement theory and pragmatic design frameworks, we derive seven co-design principles. Contribution/Results: These principles guide the transformation of academic metrics into deployable, interpretable, and iteratively improvable engineering solutions. The work establishes a theoretically grounded and empirically informed bridge between research and practice in LLM fairness assessment, advancing translational fairness evaluation across academia, industry, and policy domains.

Technology Category

Application Category

📝 Abstract
The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language model (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.
Problem

Research questions and friction points this paper is trying to address.

Assess alignment of public instruments with practitioner needs for measuring LLM harms
Identify challenges in using available tools for evaluating representational harms
Propose solutions to bridge gaps between research tools and practical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-structured interviews assess practitioner needs
Identify misalignment in current harm measurement tools
Recommendations based on measurement theory
🔎 Similar Papers
No similar papers found.