đ¤ AI Summary
Existing bias detection frameworks are ill-suited for e-commerce contexts, leaving gender bias in large language model (LLM)-generated product descriptions underexplored.
Method: We propose the first e-commerceâspecific gender bias taxonomy, identifying three novel dimensionsâexclusionary norms, stereotyped representation, and performance disparityâencompassing phenomena such as size assumptions, attribute labeling, and persuasive language bias. Using data-driven qualitative coding and quantitative analysis, we compare outputs from GPT-3.5 and e-commerceâspecialized LMs on real-world product description tasks.
Contribution/Results: We uncover statistically significant gender biasesâincluding disproportionate clothing size attribution and imbalanced functional descriptor emphasisâacross both model types. This work establishes the first measurable benchmark and dedicated evaluation framework for gender bias assessment in e-commerce AI systems, enabling rigorous, domain-aware bias governance.
đ Abstract
While gender bias in large language models (LLMs) has been extensively studied in many domains, uses of LLMs in e-commerce remain largely unexamined and may reveal novel forms of algorithmic bias and harm. Our work investigates this space, developing data-driven taxonomic categories of gender bias in the context of product description generation, which we situate with respect to existing general purpose harms taxonomies. We illustrate how AI-generated product descriptions can uniquely surface gender biases in ways that require specialized detection and mitigation approaches. Further, we quantitatively analyze issues corresponding to our taxonomic categories in two models used for this task -- GPT-3.5 and an e-commerce-specific LLM -- demonstrating that these forms of bias commonly occur in practice. Our results illuminate unique, under-explored dimensions of gender bias, such as assumptions about clothing size, stereotypical bias in which features of a product are advertised, and differences in the use of persuasive language. These insights contribute to our understanding of three types of AI harms identified by current frameworks: exclusionary norms, stereotyping, and performance disparities, particularly for the context of e-commerce.