🤖 AI Summary
This study investigates speciesism—the morally discriminatory treatment of beings based solely on species membership—in large language models (LLMs), a previously unexamined bias in AI ethics. Method: The authors introduce SpeciesismBench, the first dedicated benchmark for assessing speciesist reasoning in LLMs, integrating psychological paradigms and generative evaluation tasks across 1,003 items, with human behavioral data as a reference standard. Contribution/Results: Experiments reveal that while LLMs reliably detect speciesist statements, they rarely condemn them explicitly; exhibit strong anthropocentric bias in moral trade-offs; and demonstrate heightened tolerance for harm to farmed animals. Crucially, when non-human animals are ascribed cognitive capacities equivalent to humans, species-based preference vanishes—indicating that LLMs primarily evaluate moral standing based on inferred cognitive attributes rather than taxonomic identity. This work establishes the first empirical framework for studying animal moral status in AI systems, exposing latent value assumptions and normative limitations in contemporary LLMs.
📝 Abstract
As large language models (LLMs) become more widely deployed, it is crucial to examine their ethical tendencies. Building on research on fairness and discrimination in AI, we investigate whether LLMs exhibit speciesist bias -- discrimination based on species membership -- and how they value non-human animals. We systematically examine this issue across three paradigms: (1) SpeciesismBench, a 1,003-item benchmark assessing recognition and moral evaluation of speciesist statements; (2) established psychological measures comparing model responses with those of human participants; (3) text-generation tasks probing elaboration on, or resistance to, speciesist rationalizations. In our benchmark, LLMs reliably detected speciesist statements but rarely condemned them, often treating speciesist attitudes as morally acceptable. On psychological measures, results were mixed: LLMs expressed slightly lower explicit speciesism than people, yet in direct trade-offs they more often chose to save one human over multiple animals. A tentative interpretation is that LLMs may weight cognitive capacity rather than species per se: when capacities were equal, they showed no species preference, and when an animal was described as more capable, they tended to prioritize it over a less capable human. In open-ended text generation tasks, LLMs frequently normalized or rationalized harm toward farmed animals while refusing to do so for non-farmed animals. These findings suggest that while LLMs reflect a mixture of progressive and mainstream human views, they nonetheless reproduce entrenched cultural norms around animal exploitation. We argue that expanding AI fairness and alignment frameworks to explicitly include non-human moral patients is essential for reducing these biases and preventing the entrenchment of speciesist attitudes in AI systems and the societies they influence.