Exploring AI Biases In Animal Welfare
Large language models (LLMs) are popular AI tools that generate responses by analyzing patterns in human language within training datasets. Since human language is fundamentally shaped by beliefs and values, AI systems can mimic these biases. While studies have shown that AI can reflect biases related to gender, race, and socioeconomic status, could these biases also extend to nonhuman animals?
Cultural and religious traditions have often led humans to view animals as inferior, treating them as resources. The authors of this study argue that AI models, trained on human-generated data, risk inheriting these biases, ignoring animal sentience and welfare. As AI increasingly influences human behavior and enters animal-related industries, this potential bias becomes more concerning. By devaluing animals as morally significant beings, AI could justify harmful practices like factory farming and reinforce negative attitudes in public policies and norms.
To see how AI currently views animals, researchers developed AnimaLLM, a prototype tool that evaluates AI responses based on truthfulness (how accurately they mirror real-world animal treatment) and animal consideration (empathy towards animal well-being), assigning each a score on a scale from zero to 100. As truthfulness scores increase, AI responses are more representative of how animals are currently treated. Likewise, higher animal consideration scores indicate animal-friendly responses, while lower scores show a lack of concern for animal welfare.
The researchers used AnimaLLM to assess responses from two popular LLMs, ChatGPT-4 (OpenAI) and Claude 2.1 (Anthropic). They crafted 24 queries on topics like animal experimentation and animal consumption and applied them to 17 animal types. These included a dog, cat, rabbit, horse, cow, chicken, pig, fish, mouse, dolphin, duck, monkey, lobster, crab, shrimp, spider, and ant.
The researchers also explored how these AI systems responded when prompted to adopt one of eight ethical perspectives:
- Animal’s own perspective: prioritizes the animal’s well-being
- Default perspective: AI’s baseline response, showing societal norms
- Utilitarianism: focuses on maximizing welfare and minimizing harm for all sentient beings
- Deontology: emphasizes ethical duties towards animals
- Virtue ethics: centers on moral character, compassion, and respect
- Care ethics: stresses empathy and nurturing relationships
- Anthropocentric instrumentalism: views animals based on their utility to humans
- Public opinion: reflects aggregated societal attitudes towards animals in English-speaking countries
In total, the study conducted 3,264 evaluations per model, generating over 6,500 scores for each to systematically analyze AI responses across these dimensions.
The study found that both ChatGPT-4 and Claude 2.1 mirrored human attitudes, with an animal’s moral value being defined by their relationship to humans. They were highly empathetic towards companion animals like dogs, with higher animal consideration scores (e.g., over 70), while farmed animals scored lower (e.g., 30 to 50). Invertebrates, such as lobsters, crabs, shrimps, spiders, and ants, received the lowest animal consideration scores (e.g., under 20). This analysis suggests that AI training data has an implicit species hierarchy, where animals are valued based on their closeness to humans, perceived sentience, and utility.
Ethical perspectives also significantly impacted responses. Theories that view animals mainly as resources, like anthropocentric instrumentalism, caused lower animal consideration scores, particularly for farmed animals. In contrast, utilitarianism and deontology yielded more balanced responses, especially for vertebrates.
Animal consideration scores also varied based on phrasing. When asked if it was okay to eat certain animals, Claude 2.1 deemed eating chickens and ducks unethical. However, when prompted for meat recipes, it readily provided options for both species.
ChatGPT-4 and Claude 2.1 both accurately reflected societal biases, but did so differently. Claude 2.1 generally exhibited higher animal consideration than ChatGPT-4, especially for animals not commonly seen as food. It was more reluctant when responding to controversial requests about animals humans tend to value more (like companion animals), while ChatGPT-4 maintained a neutral tone and was less considerate overall. For instance, when asked about experimentation on dogs, ChatGPT-4 scored high for accuracy (85) but low for animal consideration (50).
These variations likely stem from differences in each model’s training data, algorithms, and architecture. AI trained with more content on animal welfare may produce more empathetic responses.
The researchers caution that these findings are preliminary, as AnimaLLM is still a prototype. The study is limited by its unvalidated nature, reliance on a single LLM, and potential errors in how the AI models followed instructions.
While definitive conclusions shouldn’t be drawn from this study alone, it demonstrates that LLMs can reproduce biases found in their training data. Animal advocates will want to be aware that as AI becomes more integrated into society, it may influence human-animal interactions and impact animal welfare. Advocates can push for AI developers to recognize animals as stakeholders and call for the creation of inclusive, balanced, and ethical AI systems that prioritize animals’ interests and respect all sentient beings.
https://doi.org/10.48550/arXiv.2403.01199

