When Is Good Enough Good Enough? AI, Research, And Animal Advocacy
Over the past few years, artificial intelligence has rapidly shifted from a highly specific set of machine learning applications to an abundance of widely available, generative tools used by the general public. Large language models (LLMs) in particular offer animal advocates the promise of putting a world of data and automation at their fingertips, and for a movement often constrained by limited resources, the allure is undeniable: AI offers a chance for us to streamline our least impactful tasks, freeing us up to focus on what makes the biggest difference for animals. However, this potential for time savings invites a critical examination of quality and what exactly those ‘least impactful’ tasks might be.
Why Is AI Important For Animal Advocacy?
AI is many things, and much of the discussion of AI in advocacy is focused on generative tools such as ChatGPT, Gemini, Claude, and others. Tactically, the hope is that generative AI can bolster and improve how advocates communicate their message and mobilize support. AI-driven analysis of social media trends, public sentiment, and demographic data could help organizations craft more effective campaigns, identify receptive audiences, and optimize their messaging strategies. Custom chatbots and AI assistants could provide personalized guidance to individuals seeking to adopt more animal-friendly lifestyles, answering questions about plant-based nutrition, cruelty-free products, or local advocacy opportunities at scale.
Behind the scenes, advocates are also hoping that generative AI integrations could enhance our productivity. There are many active and ongoing discussions among animal advocates on how AI could help us do our work — from researching to strategizing to writing to logistics — faster and more efficiently. This potential has received a good amount of attention because there is the work we want to be doing and the less exciting work that gets in the way.
The promise of AI assistance is that it could help us do less of the busywork, and provide an always-welcome boost to the more direct work we do for animals. Initiatives like the Amplify For Animals training cohort have cut through much of the hype and provided hundreds of animal advocates, from a broad range of organizations and working in many different positions, with practical training to these ends.
Is “Good Enough” Good Enough? A Deeper Look At Summarization As An AI Use Case
Perhaps most relevant to the branch of the movement dedicated to research and knowledge production is the use case of summarization. AI tools have placed more or less instant summarization in the hands of advocates (and the general public), and AI overviews are already cannibalizing search traffic across the Internet. Meanwhile, Faunalytics curates and maintains a voluminous Research Library, and our Research Library Manager Meghann Cant oversees dozens of volunteer writers from around the world who summarize journal articles and reports for us to publish. This is one of our key services to the movement: translating scientific knowledge for everyday use in animal advocacy.
Many in the movement have surmised that summarization is an obvious use case for AI. After all, LLMs are in many ways built to summarize vast amounts of information into shorter formats. What’s more, they’re quick — what might take a volunteer a few hours can be accomplished by an LLM in a matter of minutes or less. However, it’s easy to get carried away at the thought of massive time savings and ignore some of the more persistent and pernicious problems LLM summarization carries with it. Below, Meghann shares examples of these problems from her own experience of testing and using LLMs on a daily basis.
Two Truths And A Lie
LLMs are very confident in their output, a phenomenon that contributes to people lowering their guard because humans are hardwired to accept and believe information delivered confidently. Of course, LLMs “hallucinate,” but because we’re primed to take them at their word, this can make the errors harder to spot.
Case in point: An LLM-generated summary of a qualitative paper closely paraphrased the argument the authors were making, but at one point, it completely made up a participant quote in support of that argument. Without verifying the summary line by line, we likely would’ve accepted the quote as is because it sounded so plausible. However, it was nowhere to be found in the original article — not even a similar version of it. This reinforces the importance of questioning everything LLMs assert down to the last detail.
Confirmation Bias (or Oops, Your Bias Is Showing)
Another reason why LLM errors can be so hard to spot is confirmation bias — we’re less likely to second-guess the information being presented when it confirms what we already believe, even if that belief is false.
Case in point: An LLM-generated summary stated that a specific demographic group was more likely to recognize signs of poor animal welfare in images on social media. Because that aligned with what we already believed to be true, we were surprised to discover that the original study actually found the exact opposite. This type of error shows that, unfortunately, a “gut check” isn’t always enough to identify errors, and that we should always be open to challenging LLM outputs (and our own assumptions!).
Bending The Truth (or Playing Fast And Loose With The Truth)
This issue is a little more insidious: rather than a glaring error, LLMs sometimes overstate the data. It’s tempting to let these instances slide because they’re technically not incorrect — they represent more of a case of reading between the lines or unsupported extrapolation. However, we would again stress the necessity of going back to the original source and asking ourselves whether the data actually supports the claim being stated with such certainty.
Case in point: An LLM-generated summary gave us detailed descriptions of some key barriers that stand in the way of people collecting and acting on data. But the original article only had the barriers listed in a figure without those additional details. While the descriptions seemed reasonable, they didn’t come directly from the source material and may not have been what the author intended or what the evidence actually found. They were merely inferences and LLM “speculation.”
With these types of errors at play, you may be wondering why we would bother using AI to generate summaries for our Library, given the fact that in-depth accuracy checks seem to detract considerably from the time savings. Or perhaps you’re thinking that a human is just as fallible. After all, people can (and do!) misinterpret data and exaggerate findings, too. However, in our experience, it’s a lot less likely (unless, of course, they surreptitiously use LLMs in their work). Volunteers who are unsure whether they’ve fully understood a statistic or characterized an argument correctly are often upfront about their uncertainty. Being honest about your uncertainty can make all the difference, especially when LLMs can be so confidently incorrect.
Some AI advisors have suggested that we should test LLMs with subjects that we’re experts on so that we might be able to catch errors more readily — and this approach might be a good one, keeping in mind that confirmation bias could lead us astray. Inevitably, however, we will likely ask an LLM to do a task outside of our scope of expertise, and they won’t always make errors the same way each time. We feel strongly that nothing can replace a human doing detailed fact-checking. While there may be a point in the future where LLMs are above the need for verification, their non-deterministic nature makes this more of a hope than a firm eventuality.
Still, generative AI does have the potential to help us be more efficient and there are ways to improve the process. Here are a few things we’ve found helpful in our AI journey so far:
- Asking other LLMs to check the output for accuracy: Using several different accounts, other models have let us know when the LLM in question has perhaps taken some liberties in its summary, straying a bit too far from the original argument or intention. This can help bolster — but not replace — fact-checking.
- Creating custom LLMs (Custom GPTs, Gems, Claude Skills & Projects, etc.): We’re optimistic about this approach because system prompts can instruct LLMs to think like an animal advocate, “try on” specific expertise, and make fidelity to the original source the most important factor in their response. This has helped cut down on errors significantly, though we still find instances where the LLM misconstrues, misstates, or mischaracterizes things. It’s important to remember, however, that instructing an AI model that “you are an animal advocacy expert” is akin to giving an actor instructions on how to play a role, rather than calling forth a model that is actually materially different.
- Using a source-grounded model (e.g., NotebookLM): Even custom LLMs need instructions to stick to the sources provided, while a source-grounded LLM is designed to only interact with what you give it. It can still make mistakes, of course, but fact-checking is much more straightforward because you can click on the citations and pinpoint exactly where the LLM drew the information from. However, in our experience, you may lose some creativity using a source-grounded model.
Overall, we remain firm that when it comes to data, our approach needs to be steadfast in adhering to a high standard. To a certain degree, the quality of output depends on the quality of prompts and instructions, but a strong final summary depends on our willingness to forgo time savings for the sake of ensuring accuracy. It can be tempting to downplay the issues, especially when the errors aren’t egregious or seem inconsequential, but this is a slippery slope. If we accept that an LLM summary of a given topic may only be 90% accurate, are we okay with 87% accuracy? What about 85%? What about 80%? Standards are only useful if we adhere to them, even when it’s inconvenient to do so.
Perhaps unsurprisingly, Faunalytics strongly believes that we owe the animals our commitment to rigor — otherwise, we aren’t really engaging in a data-driven approach. When it comes to providing you with the data you need to do your advocacy, we don’t believe that “good enough is good enough.”
Offloading Responsibility
The above exploration of summarization highlights a key problem: generative AI tools have a long way to go before we can trust their output without verification. Fortunately, animal advocates do not have to rely on LLMs to summarize research for them, as there are still libraries of human-summarized, –synthesized, and –verified information that they can count on as trustworthy sources.
In our recent review by Animal Charity Evaluators (ACE), one of the programs they questioned the impact of was our Research Library, stating that “AI tools can now be used to summarize available research in ways that are probably just as helpful as the summaries provided by Faunalytics.” This was surprising, given ACE’s own Responsible AI Usage Policy states plainly that “LLMs can generate erroneous output and should not be relied on as definitive sources of evidence.” Elsewhere, other advocates have pondered whether research orgs will even be necessary in the next five to 10 years — conjuring images of individual, atomized activists, sinking into the average internet user’s heroic fever dream of “doing their own research” with a level of analysis that would make a PhD jealous.
While we understand the emotional (and even tactical) desires behind such thinking — and setting aside the fact that at least some of this speculation is intended to drive social media engagement more than anything — the reality is a lot more messy. LLMs may provide a good starting point of summarization or desk research, but can be prone to focusing on a strange or narrow set of sources, glossing over important information, or even fabricating “facts” whole cloth, as noted above. “Hallucinations” that were common in earlier models have not been eliminated with subsequent releases, and it’s arguable that, due to LLMs’ non-deterministic nature, they may never be. Meanwhile, terms like “Deep Research” and “Thinking Mode” might give us a false sense of security — yes, they may include further loops of information retrieval and output refinement, but much like “humanely raised,” they are ultimately industry marketing terms with no strict, common, or legal definition.
By eschewing advocate-led research and summarization, and placing research labor entirely in the hands of individuals using AI, we believe animal advocates come out at a deficit: the work of verification gets offloaded to each individual rather than a trusted organization or small handful of organizations, reducing the work of specialized orgs but increasing the work of each individual advocate. In a worst-case scenario, advocates forgo fact-checking their AI-led research at all, trust AI outputs by default, and use misleading information to inform their work for years to come. It may be tempting to brush this scenario off, but the data shows we should take it dead seriously: a recent study found that only 8% of respondents always check the sources that AI overviews provide. Considering how social desirability bias may factor into a study like this, the actual number could be even lower!
Of course, this is logical: one of the oft-repeated key value propositions of AI is as a time-saver. Verifying everything an LLM is telling you whittles away at the time savings you’re supposed to be experiencing, and once you start finding errors, makes the whole process much less efficient.
Understanding The Loop
On a broader level, and perhaps even more importantly, continuing to produce, publish, and summarize research is actually vital to the continued usefulness of AI: when you ask LLMs animal advocacy-related questions, models will regularly cite Faunalytics, Bryant Research, Animal Ask, Rethink Priorities, and other research orgs’ resources, and provide links to our Library among others. For AI to continue to be useful, it actually requires the constant ingestion of high-quality (i.e., human-generated) data. By removing this inflow of data from the loop, it’s arguable that generative AI would become less useful for animal advocates over time. What’s more, as “AI slop” and low-effort content proliferate in earnest, decreasing the publication of responsible, human-vetted content will only increase the space taken up by content which is inaccurate, misleading, or both.
Building a data-driven movement is not (and should not) be the work of individual advocates. Movement strategy and action is ultimately a group activity that needs to be undertaken by all of us, together. What’s more, animal advocacy research is not a drain on the movement’s resources — far from it. If anything, our movement may be underfunding research; seen another way, the movement is getting a great deal of value from a very nominal amount of investment, and continued investment in that value is crucial to our future, even if we plan to use AI in a much more thorough way.
The Future Of Knowledge Dissemination In An AI-Forward Context
Faunalytics views data-informed decision-making not as a simple task to be automated, but as a critical foundation to be continually cultivated. Doing research and finding data to inform your organization’s strategy and action is not busywork: it should be a core activity that advocates take seriously. Our philosophy is about an intentional approach, onboarding new technology strategically and deliberately — not for the sake of novelty, and never losing sight of second-order effects like atomization and isolation within our movement. We are stronger together, and it’s vital that we support each other in the further study of our best and most effective practices.
For the past two-and-a-half decades, Faunalytics has existed to encourage and strengthen data-driven approaches to animal advocacy; we strongly believe that we owe animals a commitment to strategy that is inspired by empathy and guided by data. To that end, we remain committed to ensuring that the data driving our advocacy remains a sharp, reliable instrument rather than a “good enough” blunt guess, or a strategy we follow because it sounds correct and confirms our biases. We likewise remain optimistic that AI will be able to help those animal advocates who use it carefully and in a clear-eyed way.
Moving forward, Faunalytics is continuing to experiment with how AI can boost our movement’s collective work, while maintaining our standards for accuracy. We’ve begun publishing carefully vetted AI summaries — focusing on long reports which would be onerous to assign to volunteers — into our Research Library. We’ve also been working with experts in the field to develop an AI tool that will reliably synthesize the data in our Library, providing instant overviews of animal advocacy issues using sources that we’ve already vetted, read, and summarized, which we hope to roll out in early 2026. And finally, Faunalytics’ AI Committee is meeting regularly to ensure that we stay abreast of new developments and keep our momentum up.
Just as we’re encouraged by initiatives like Amplify For Animals and how they’ve been giving advocates practical, actionable training, we’re encouraged by advocates who are staying informed about developments in the AI field, while remaining critical and keeping standards high. In a forthcoming blog, we’ll zoom out from the research realm and take a broader look at the state of the movement in relation to AI, with a specific focus on how the politics, economics, and ethics of the AI field more broadly may impact animal advocacy strategy.

