π€ AI Summary
This study addresses the risk that large language models (LLMs) may embed and amplify gender and racial biases when generating consumer product recommendations, thereby compromising fairness. The authors propose the first reproducible bias detection framework, which leverages prompt engineering to elicit recommendations tailored to diverse gender and racial groups and systematically quantifies bias through a combination of keyword analysis, support vector machine classification, and JensenβShannon divergence. Their findings provide the first empirical evidence of significant inter-group disparities in LLM-generated recommendations, confirming the presence of measurable bias. This work establishes both a methodological foundation and empirical basis for developing fairer recommendation systems grounded in rigorous bias assessment.
π Abstract
Large Language Models are increasingly employed in generating consumer product recommendations, yet their potential for embedding and amplifying gender and race biases remains underexplored. This paper serves as one of the first attempts to examine these biases within LLM-generated recommendations. We leverage prompt engineering to elicit product suggestions from LLMs for various race and gender groups and employ three analytical methods-Marked Words, Support Vector Machines, and Jensen-Shannon Divergence-to identify and quantify biases. Our findings reveal significant disparities in the recommendations for demographic groups, underscoring the need for more equitable LLM recommendation systems.