No Better Than Flipping A Coin?: Behavioral Evaluations Of Shelter Dogs
Many shelters across the world make use of behavior evaluations for dogs brought in or relinquished for adoption. These behavior evaluations typically utilize a number of sub-tests consisting of a stimulus, such as a loud noise or a person dressed in a costume, to evaluate how the dog responds to the situation. These evaluations are meant to help shelters identify “dangerous dogs”—dogs with significant risk of doing harm to others through biting—and to avoid “failed adoptions,” in which an initial adopter returns the dog to the shelter.
As this study points out, however, “dangerous dogs” are rare in shelter environments and can usually be identified prior to administering any kind of behavior test, and “failed adoptions” are not a serious concern. The latter phenomenon does not seem to negatively impact the dog’s later adoptability, and may, in fact, serve to offer dogs a brief break from the shelter environment, much like a foster care placement.
The paper analyzes 17 studies of canine behavioral evaluations that report a measure of reliability or validity, in order to determine whether such evaluation tools should be applied in shelter environments. The authors conclude that the evaluations do not have the scientific validity that people seem to think they do, and that this may translate into negative outcomes for dogs in shelters. For example, if a shelter worker decides to euthanize a dog rather than put her up for adoption using a behavioral evaluation, they are making a life-or-death decision on the basis of a test that is not validated and may not be predictive of that dog’s future behavior.
The authors identify a number of problems with the existing literature on canine behavior evaluations. First, scientific and colloquial meanings of words like “predictive,” “validated,” and “reliable” are used interchangeably, potentially causing confusion about the stringency with which the evaluation has been vetted. In reality, neither inter-shelter or inter-evaluator reliability has been established, nor has it been established that certain behaviors are reliable indicators of consistent personality traits in dogs. Neither has any previous researcher managed to show that the evaluations do a good job of measuring what they purport to measure: it remains an open question as to how and why certain behaviors might somehow impact a dog’s successful adoption. Furthermore, all of the tests are fairly different, and there is no single validated test against which to compare new tests to establish their relative levels of validity.
The authors also note that these tests are often essentially claiming to have predictive ability, meaning that they are able to inform evaluators of how dogs will behave in the future. Unfortunately, the subtests themselves may cause reactions that are simply due to the testing situation and the dog’s discomfort with the shelter environment. Thus, the tests have high levels of “false negatives” and “false positives” on behaviors; for example, a dog may growl at an evaluator because he is nervous in the shelter environment, when he might not do so in a home where he is comfortable, and then be labeled as “aggressive” because he exhibited growling behavior.
In the end, the authors conclude that 25 years of research into these behavior evaluations does not seem to suggest that they are useful predictive tools, nor does it seem likely that they will improve their level of validity in the near future. As such, it’s not clear that they do any good, and they risk severely negatively impacting the lives of dogs in shelters. This is especially true given that shelter environments and methods of adapting and administering even the same test differ widely, which suggests that a test developed for one environment may not be valid when used by another shelter. Thus, the authors argue that these evaluations should be abandoned. At the very least, research into these types of evaluations should be presented in a standardized way that includes the tools’ error rates, and researchers should be very careful about how they use language like “reliable” or “validated” in their writing about the evaluations.
For animal advocates, this may be a surprising, and indeed stunning conclusion. It’s never comfortable to realize that we have been using ineffective tools. The results of this study, and the book to follow, are sure to cause debate in companion animal advocacy circles; it may be a debate that is long overdue.
