Study design is the design of the process surrounding a survey or other data-collection effort. It considers who will be surveyed, when they will be surveyed, how they will be approached, how data will be analyzed, and other procedural issues.
Step 1: Are You Sure You Need To Design A Study?
A properly designed study that will provide reliable findings isn’t cheap, quick, or simple, even for experts. And “good enough” generally isn’t. Any study will produce data, but if it isn’t designed well, those results are likely to be unhelpful at best, catastrophically misleading at worst.
Before considering a new study, it is well worth pouring several hours into a literature review: That is, searching for research on similar topics that has already been done. Well-conducted research on something a little different from what you’re interested in will provide you with much better information than poorly-conducted research on your exact interests.
Sources to check:
Make sure to try similar keywords to avoid missing anything!
Step 2: Defining The Study’s Purpose
Every aspect of a study should be guided in part by the purpose of the study. It is often easiest to think of the purpose of the study in terms of a list of questions that it should answer. Examples include:
- “How many people can we persuade with this leaflet?”
- “Should we include a health argument in our new video?”
- “Are our Facebook followers really involved with our organization?”
If there are many questions that the study should answer, it is helpful to rank them in order of priority. The study can be designed to answer the most important questions as accurately as possible, and other questions can be addressed within that framework if there is enough time and space for them.
Step 3: Choosing A Study Type
This section will help you determine what type of study you should run. The options we cover are: Surveys, Interviews & Focus Groups, Experiments, and Program Evaluations.
“How many people support a ban on slaughterhouses?”
“What percentage of people with companion animals take them to the vet every year?”
“What do most people think is the most important animal issue?”
Conduct a survey if ALL of the following statements are true:
- You want numbers or percentages like in the first two examples above
- Participants can easily answer all of your questions by choosing from a list of options or writing a few sentences (all three examples above)
- You think people have good insight into their behavior or attitudes and will answer honestly (there are many circumstances when people can’t predict their own behavior, haven’t formed an attitude, or won’t answer honestly for a variety of reasons)
- You are not trying to influence the numbers/percentages you’re measuring (e.g., with a video, a leaflet, or anything else) – if you want to know whether your efforts are effective, go to the section on experiments
- You can select participants from your target population either completely at random (e.g., hiring a company like Positly, SSI, Ipsos, or YouGov to get a nationally-representative random sample) or by including everyone (e.g., asking everyone who visits your event to do the survey)
Finding Or Writing Survey Questions
This key aspect of survey design deserves its own section, here.
“What do people see as the pros and cons of cultured meat?”
“What kind of experiences do people have at the vet in an emergency?”
“How do people determine which animal causes they will support?”
Conduct interviews or focus groups if ANY of the following statements are true:
- You care more about getting rich, detailed information on a single topic or a few closely related topics than about a broad range of questions
- You want answers that are qualitative in nature—you can talk about patterns or themes that appear in participants’ responses, but you can’t talk about how many people think a certain thing
- You don’t know enough about a topic to be able to say what the most important aspects of it are before running the study
- Your population of interest is rare or difficult to reach (e.g., farm sanctuary operators, homeless vegetarians)
Interviews and focus groups are often the best starting point for a subject area where very little research has previously been done. If you look at the example questions above, they could all be asked on a survey as well, if you know enough about the subject to write the survey questions. For example: to provide a list of possible pros and cons for participants to check off, or scales to ask about different aspects of their experiences with veterinary emergencies (speed, demeanor, effectiveness, etc.).
If no one knows much about the subject, your survey questions could accidentally focus on things that aren’t important. Talking to participants in a semi-structured interview or focus group allows you to collect a lot of open-ended data on a subject, to get a sense of what’s important and what you may have overlooked.
These can be good studies on their own, or they can be a great starting point for quantitative research. Once you know the important questions, asking them on a representative survey or looking at behavior in an experiment can give you a lot more information.
Designing An Interview Or Focus Group Guide
Interviews and focus groups can seem deceptively simple. Here are a few key points to remember:
- Ask the same set of questions of everyone you interview, or in every focus group. It’s okay to follow up on answers to get more detail, but you don’t want to bias your results by asking one question more often than another, or asking different questions of different people (e.g., don’t ask your youngest participants if they use their phones to access your website but not your older participants). This is the basic format of a semi-structured interview/focus group.
- Avoid leading questions, and triple-check for assumptions you might be making. For example, don’t ask “Did you choose your cat based on appearance or behavior?” That leads people to think about particular reasons when they could have had another one entirely. Instead, be more general: “Why did you adopt your cat?”
- Don’t forget the effect that your behavior has on your participants. You should treat all participants as similarly as possible, and be neutral and polite. Remember that you are there to lightly guide the conversation, not to talk a lot. And never assume you know what a participant is going to say before they say it. Being overly involved can influence participants’ answers because it may make them want to please, amuse, annoy, or impress you.
- Don’t include too many questions. If you’re asking questions with complex or multi-part answers, it’s particularly important to allow time for your participant(s) to talk through it naturally. Broadly speaking, you can probably have a good discussion about 10-12 questions in a one-on-one interview lasting 1.5 hours. For a focus group, assume you can include half as many because multiple people will contribute to each question.
However, it’s very hard to come up with a rule of thumb because it depends on question complexity, how much the questions are related to one another, how many people are participating, and who your participants are. Ask your questions in order of importance and, if you can, do a realistic practice run with a non-participant to see how long they’ll naturally tend to talk about each question.
- Keep your question language simple, and feel free to include a rephrasing to help people understand. For example: “What were the main benefits and drawbacks of transitioning to veganism for you? Can you tell me about the pros and cons?”
- Also keep your question wording natural—speech needs a more natural flow than written text, so it’s okay not to be grammatically perfect. But avoid slang unless it’s the only way to be clear or if you’re echoing a term already used by a participant.
Recording The Data
For focus groups and interviews, you have two main options: audio-recording (and possibly later transcription) or note-taking.
Audio-recording is more thorough and accurate but harder to analyze later because of the amount of data to wade through. If you choose this approach, you must get all participants’ consent to be recorded, and take strong measures to protect the recordings. Confidentiality and data security are very real concerns with detailed interview data because participants may be identifiable even if they don’t use names.
Typed notes on participants’ comments are much easier to analyze but you risk missing important points. If you choose this approach, we strongly recommend having a separate note-taker and moderator (i.e., the person asking the questions and listening to the answers shouldn’t also try to take notes). Make sure that the note-taker is a fast typist! Don’t try to get everything down verbatim, just key points, but occasional direct quotes can be nice to have when they summarize an issue well.
Analyzing Themes In The Data
Analyzing qualitative data systematically takes time and training. You will need to develop categories of responses and code participants’ comments into those categories. It’s best if this process begins before you collect any data. For an example of the process, see Faunalytics’ research design document for our study of consumer reactions to corporate cage-free egg commitments. That study used data from Facebook comments rather than interviews or focus groups, but the analysis process is essentially the same.
“Which of these three leaflets is the most convincing that people should buy cruelty free products?”
“Which method of selecting animal guardians results in fewer problems and returns: an adoption questionnaire or an interview?”
“Does this video make people eat less meat?”
Simply Psychology has an excellent, approachable overview of different types of experiments and when you should use them.
In general, conduct an experiment if ALL of the following statements are true:
- You are most interested in the influence or cause of something (e.g., a video, a program, a visit) on people’s attitudes or behavior
- You don’t want to explore a broad topic (e.g., Faunalytics’ lapsed veg*n study was very broad, looking at a large number of factors related to veg*nism)—if you do, an experiment will be too limiting
- You are willing and able to randomly assign participants to an experimental group (e.g., by flipping a coin or using a random number generator). Neither the researcher nor the participant can choose which version of the experiment a participant gets, because that would completely invalidate the results. Experiments are built on the principle of all groups being the same, on average, at the start. That condition is only met if assignment is truly random.
Determining Your Experimental Conditions
Experiments are used to look at the difference between things. Always – it’s the only way they work. Therefore, any question you have has to be framed in terms of a difference. The first example above looks at the difference between three leaflets, the second between two videos. These are examples of A/B Testing, a specific type of experiment that you can read more about in the next section. In the third example, the difference is harder to spot… it’s the difference between watching an advocacy video and not watching an advocacy video.
You may also have heard experiments referred to as Randomized Control Trials or RCTs. Not all experiments are RCTs, but all RCTs are experiments. The important factor is whether you have a control group or not. The first two examples could be studied without a control group (though you might want one anyway), but the third has to have one.
Figuring out your control group—whether you need one and what it should be—is the hardest part of designing an experiment. If you don’t have experimental training, we recommend that you consult with an expert.
Why to Include a Control Group: An Example
Faunalytics conducted a large-scale experiment looking at Animal Equality’s video outreach (the report is available here). They wanted to see whether there was a long-term difference in the amount of pork consumed by people who watched a 360-degree VR video versus a 2D tablet version. (That’s two conditions: 360 video & 2D video.)
But think about how a study with just those two conditions could work out: Either we find a difference or we don’t. If we do, great! One video works better than the other. But if there’s no difference, we have no way of telling whether it’s because neither video works or both do. Always think through your possible outcomes and how you would interpret them!
The most obvious way to answer that question is to ask about pork consumption before and after the videos (and we did) but there are major problems with interpreting pre-post data like that. The biggest is that your participants will be very aware of what kind of study they’re in: It’s being run by animal advocates and has questions about animal suffering, so it’s clear that their pork consumption is “supposed to” go down. It’s almost a guarantee that, on average, people will report a reduction in pork consumption from before to after the video, just because of social desirability (wanting to look good) or demand characteristics of the study (saying what they think the researcher would want to hear).
The only way to see whether the videos cause a real reduction in pork consumption is to include a control group whose experience in the study is as similar as possible to the experience of the people who watch the video—but without the video. Even with a great study design, it’s probably impossible to avoid social desirability and demand entirely. So the best thing to do is to keep the control group as similar as possible so that those things affect everyone in the study about equally. Then, any difference between the amount of pork consumed by the control group people who watched the videos can only be due to the videos themselves.
This is why you need a control group in most experiments, but it’s only the beginning of figuring out what your control group will be. You need your control participants to complete the same questions/measures as everyone else so you can compare the groups, but should they do anything else? You need to think through every difference between the control group and video groups, because every difference is a possible explanation for a difference in pork consumption if you find one. The fewer differences there are, the easier it is to interpret the results.
Should your control group still watch a video? If yes, a video with animals in it or a completely neutral video? If they don’t watch another video, is the study going to be shorter for them, and could that affect their responses? Might you get a different type of person in the control group if there’s no video, because they’re signing up for a study that only involves answering questions, not multimedia?
There are ways to get around many of these questions—and you can see our report on the Animal Equality study for the solution we came up with—but none of them are perfect. The important point is to remember that experimental design is neither straightforward nor easy. If you haven’t done it before, you should get some help to make sure nothing is overlooked.
Finding Or Writing Survey Questions
If your experiment includes survey questions, see this section.
“Does a donation request focusing on how much funding we need bring in more money than a request focusing on what we’ll use it for, or vice-versa?”
“Which of these two subject lines produces the most emails opened?”
“Which of these three Facebook ads gets the most clicks?”
A/B testing is a type of experiment where you compare two versions of something to see which is best. For example, this type of test can compare two donation appeals to see which one leads to the most giving or two different email subject lines to see which gets opened the most. This approach has grown popular across a range of organizations in order to test assumptions and approaches. (For those familiar with research methods, A/B testing is really just a between-subjects experimental design with one independent variable that has two conditions/levels.)
Because A/B testing is a type of experiment, you need to randomly assign participants into one of the two conditions. You must also change only one thing between the two conditions–for example, if you change both an image and a headline, you won’t know which change led to the observed effect. A/B/C testing is also an option: presenting three or more different versions of something, like three different headlines or three different images.
Note that many marketing platforms like MailChimp and Facebook have A/B testing options built right in!
“Does this workshop lead to the desired outcomes?”
“Does visiting a farm sanctuary alter people’s attitudes towards animals?”
“How does attending a vegan food festival alter perceptions of vegan diets?”
Use a pretest/posttest design if the following are true:
- You want to learn about the effects of an intervention
- You can’t randomize people into a control condition
- You’re looking to compare how people answered or behaved before the intervention to how they answered or behaved after it
Sometimes an experimental design isn’t possible. The first issue is the fact that participants need to be randomized to conditions in an experimental design. If we are trying to measure the outcome of a workshop we’re hosting, we can’t just tell half the people who showed up that they won’t be taking part in the workshop after all because they’re in the control group!
In these cases we need another design, and pretest-posttest is a common one. Here, we ask participants the same set of questions twice, before and after the workshop. For instance, we might want to assess people’s attitudes toward farmed animals before and after a workshop about animal sentience to see if the workshop was effective in improving positivity.
However, it can be hard to be sure that any changes we observe are really due to the workshop content–we’ve lost some of our ability to control for other explanations with this design. So for example, maybe our workshop wasn’t very effective, but it offered free coffee and muffins that made people happier so they answered the questions more positively than they did at the beginning.
You might also have the opposite problem, of not seeing a difference in your pretest and posttest scores even if your workshop really was effective. If the workshop isn’t very long, your participants will likely remember how they answered the questions the first time around and may just say the same thing again without thinking about it (this phenomenon is known as anchoring and influences behavior in a variety of interesting ways).
Pretest-Posttest With Different Participants
Sometimes programs are so busy that there just isn’t time to ask participants the same set of questions twice, or we want to avoid anchoring. In this case, a variation on the standard pretest-posttest design can be used. Although this is technically called a “nonequivalent, no treatment control group design,” it’s really just a pretest-posttest with different participants doing the pretest and the posttest. It’s often a better option than the standard before and after.
In this design, each participant only does the survey once: either before the workshop or after it. As long as you can randomly assign participants to the pretest or posttest survey group (e.g., by flipping a coin for each one as they walk in), the group that does the survey before the workshop will be similar, on average, to the people who do the survey afterwards–similar enough that you can compare the results from the two groups like you would a control group.
However, the two groups aren’t quite the same, so you do have to be cautious in how you interpret any differences between them. Perhaps people answer certain questions differently as the day progresses and they get more tired. Or maybe rain clouds cleared and it became a sunny day and their mood went up just because of that. Think about the content of your questions and whether you might expect that kind of issue. As you can probably see from these examples, questions that are very based on positivity/negativity are particularly susceptible to this type of problem.
These problems also get worse the longer the gap between pretest and posttest. Moods, tiredness, weather, and other factors will all change more if you’re measuring before and after an eight-hour workshop than a two-hour movie. If it’s a five-minute video, you can probably discount the effects of time altogether.
To sum up this section, we always recommend using an experimental design whenever possible. However, if you can’t, a pretest-posttest design can help you to gain information that you otherwise couldn’t. Consider the strengths and weaknesses carefully and talk to us if you need more help!
“Is our lecture series downloaded by enough people to be worth the cost?”
“Which areas of this program could use improvement, and how?”
“Does my video increase people’s knowledge of animal abuse?”
We have included this separate section for program evaluation because many people are familiar with the term and want to conduct one, but the first thing to know about program evaluation is that it isn’t one thing. Any of the types of study used above–surveys, interviews and focus groups, and experiments–can all be used for program evaluation.
Go back to Step 2 and think about your key questions for your program evaluation, then determine which method or methods you’ll need to answer them. Wikipedia also has a much more detailed description of different aspects and approaches to program evaluation.
Finally, if a simple program evaluation survey is all you’re looking for, we have an example you can follow.
Step 4: Identifying Your Participant Population & Determining Your Sample Size
The population of (potential) participants is the group of people you want to draw conclusions about. For many studies, the closest answer will be “the adult population of your country,” and this is the population that panel companies are set up to target.
Before determining how many participants you need, remember that any percentage or number you get in your survey—no matter how well designed!—is not likely to be exactly the same as it would be in the population. It is an estimate of the population value, and there will be a margin of error that tells you how accurate it is.
Often, surveys reported by polling firms like Gallup have a margin of error of 3%, which means that the percentages they report are accurate within ±3%. Faunalytics also aims for a 3% margin of error on our population estimates. If you want the same level of accuracy, your survey will need a minimum of 1,068 participants.
If that sounds like a lot, the lowest you might want to go is 385 participants, for a margin of error of 5%. It’s easy to calculate these yourself using a sample size calculator like this one. The numbers above are based on estimating for the U.S. population (with 95% confidence (95% is standard and probably what you should use).
Example: If you ask 385 randomly-selected people in the U.S. for their favorite animal and half of them (50%) say cats, you can assume with 95% confidence that the percentage of people in the U.S. whose favorite animal is a cat is between 45% and 55% (50% ±5%).
Note that surveying only 385 people makes for a pretty wide range of possible values! How many participants you need depends on how accurate you need to be.
Interviews Or Focus Groups
Recommendations vary for the minimum number of participants in a qualitative study, but come down to the idea of saturation. You want enough participants to reach the point where no new information or themes will emerge if you talked to additional people. This principle depends on an iterative process in which interviews are conducted, the data are analyzed, then more interviews are conducted, etc. For more on saturation and how to achieve it, this editorial provides a succinct summary (Morse, 1995).
Iteratively looking for data saturation is the best way to determine sample size, but it doesn’t give a good rule of thumb. Recommendations vary a lot, but most commonly, researchers and journals recommend between 20 and 50 participants for interviews (e.g., here and here). However, as few as 12 interviews have been enough to achieve saturation for some topics (Guest, Bunce, & Johnson, 2006). We recommend that you aim for at least 20, but if you can’t get that many, it may still be enough. Try to limit your topic if you have fewer participants.
The best sample size is more difficult to determine for focus groups. You can’t just look at the total number of participants, because you get less information from each participant in a focus group than an interview (sometimes much less). For that reason, it is better to use interviews if possible. If you have to use focus groups for some reason, a large number of small groups is better than a small number of large groups.
The number of participants needed for an experiment depends on the design and the size of the difference(s) you want to be able to see. For example, if the control condition reports eating 2% less pork than the video condition, is that still meaningful to you? You may decide that you wouldn’t continue funding your video outreach for a difference of less than 5%.
The smaller the potential difference you care about, the more participants you’ll need. If you want to be able to find a small difference, assume that you’ll probably need about 310 participants per condition. For a larger difference, you might only need 100 per condition. To find out exactly, you will need someone to run a power analysis once you’ve specified your design.