Why Methodology Matters More Than Headlines
Every few months, a new service dog study circulates through my professional network. The headline promises something bold. Veterans with PTSD service dogs show dramatic symptom reduction. Autism service dogs transform family quality of life. The sharing starts immediately, and the citation chains multiply across nonprofit newsletters, advocacy websites and congressional testimony.
I read those headlines differently than most people in this field. After 15 years working directly in service dog training, animal behavior and nonprofit healthcare operations as Executive Director of the TheraPetic® Healthcare Provider Group, I have learned that a headline measures marketing, not science. What measures science is what most people skip entirely: the methodology section.
Service dog research methodology is not an academic technicality. It determines whether the conclusions the paper draws are actually supported by the data collected. When practitioners, policymakers and funders cite flawed studies to justify clinical decisions or legislative action, real people are affected. I think the field owes its clients a much more rigorous reading of this literature than it currently delivers.
This post walks through how I personally evaluate service dog efficacy studies. I will cover the specific flaws I encounter most often, explain why self-report bias is a particularly serious problem in PTSD service dog research and describe what I consider the minimum criteria for a study worth citing in a clinical or policy context.
The Sample Size Problem in Service Dog Studies
The single most common flaw I encounter in service dog research is an underpowered sample. A study is underpowered when its participant count is too small to reliably detect a true effect even if one exists. Most service dog efficacy studies I review are working with samples between 20 and 60 participants. Some are smaller.
That is not inherently disqualifying. Pilot studies with small samples serve a legitimate purpose: they test feasibility, refine measurement instruments and generate hypotheses for larger confirmatory trials. The problem is that in service dog research, pilot studies routinely get treated as confirmatory evidence. The distinction between exploratory and confirmatory research collapses, and the field builds its evidentiary base on preliminary data presented as settled conclusion.
Power analysis should be reported in any study making efficacy claims. A properly powered study for detecting a moderate effect size in a PTSD symptom reduction trial typically requires 80 or more participants per arm, depending on the outcome measure and the variability of the population. When a paper reports significant results from a sample of 28 participants with no power calculation disclosed, the appropriate clinical response is skepticism, not citation.
The recruitment challenges are real, and I do not dismiss them. Recruiting trained service dog teams for controlled research is genuinely difficult. Wait lists for legitimate service dog programs run 18 to 36 months in many cases. The population of working teams at any given moment is limited. These constraints do not disappear because a researcher needs a larger sample. What they argue for is more funding, longer timelines and multi-site collaborations, not acceptance of underpowered conclusions.
Self-Report Bias and PTSD Service Dog Research
Self-report bias is the most structurally embedded problem in PTSD service dog efficacy studies, and it is the one I see discussed least frequently in public discourse about this research.
Here is the mechanism. A veteran waits two years for a service dog. They invest enormous emotional energy in the placement. They may have fundraised for it, advocated publicly for it or built an identity around their working partnership. When a researcher then asks them to complete a PCL-5 or a PHQ-9 four months into the partnership, the psychological pressure to report improvement is enormous. That pressure is not dishonesty. It is a cognitive process called expectation bias, and it operates largely below the level of conscious awareness.
Expectation bias inflates self-reported outcomes in almost every domain where participants have strong prior investment in the intervention. In PTSD research specifically, where the primary outcome measures are almost exclusively self-reported symptom scales, there is no external validator available to cross-check subjective experience. A blinded clinician rating scale like the CAPS-5 (Clinician-Administered PTSD Scale) offers more protection than pure self-report, but even CAPS-5 administration is subject to social desirability effects in a non-blinded trial.
Blinding is essentially impossible in service dog research. You cannot conceal from a participant whether they have a service dog. What researchers can do is use structured clinical interviews administered by evaluators who have no relationship with the participant's placement organization, include physiological measures like cortisol, heart rate variability or actigraphy as secondary outcomes and build in longer follow-up periods of 12 months or more to assess whether reported gains persist beyond the novelty effect.
I have reviewed studies where the primary enthusiasm for results was built entirely on PCL-5 scores collected at 8 weeks post-placement with no blinded assessment, no physiological measurement and no follow-up beyond that single time point. Those studies do not tell me whether service dogs reduce PTSD symptoms. They tell me that people who receive service dogs report feeling better at 8 weeks. That is a different and much weaker claim.
Control Group Design Failures
Control group design in service dog research is genuinely hard, and I have some sympathy for researchers navigating this problem. You cannot ethically withhold a potentially beneficial intervention from someone on a waitlist indefinitely. The comparison condition shapes what question you are actually answering, and most studies answer a much narrower question than their conclusions claim.
The most common design I encounter pairs a service dog group against a waitlist control. This is not a true control condition. It is a comparison between people who currently have something and people who are waiting for it. Waitlist participants know they are waiting. That knowledge affects mood, treatment engagement and self-reported outcomes independent of the intervention. The comparison confounds the effect of having a service dog with the effect of anticipating one.
A more rigorous comparison condition would be a trained companion dog placed with identical procedural support, attention and handler training as the service dog group. This isolates the trained task component from the animal bonding and social support components of the intervention. To my knowledge, very few published trials have attempted this design, partly because of cost and partly because it requires the research team to operationalize exactly what constitutes a "service task" in the experimental condition, which forces definitional clarity that some researchers prefer to avoid.
Active comparison conditions matter because the therapeutic mechanism of service dog partnerships is genuinely unknown. Is the benefit driven by trained task interruption of anxiety spirals? Is it driven by the social facilitation effect of moving through public spaces with a dog? Is it driven by increased physical activity, structured routine or reduced social isolation? Without a comparison condition that isolates one of these mechanisms, efficacy studies cannot answer those questions. They can only describe an association between having a service dog and reporting improvement, which is a very different and much less actionable finding.
What Rigorous Service Dog Research Actually Looks Like
I want to be fair to the researchers doing this work. There is a growing body of work that takes methodology seriously, and it deserves acknowledgment alongside the critique.
The PACT Act research portfolio funded through the U.S. Department of Veterans Affairs represents the most serious investment in service dog research infrastructure that I have seen in my career. Multi-site randomized controlled trials with pre-registered protocols, CAPS-5 primary outcomes and 12-month follow-up periods are a substantively different category of evidence than most of what preceded them. Pre-registration matters enormously because it prevents the retrospective selection of favorable outcomes after data collection, a practice sometimes called HARKing (Hypothesizing After Results are Known) that inflates apparent significance in published literature.
When I evaluate a study, I look for the following minimum criteria before I consider citing it in any clinical, policy or educational context at TheraPetic® Healthcare Provider Group:
- Pre-registration on ClinicalTrials.gov or an equivalent registry prior to data collection
- Disclosed power analysis with justified sample size targets
- At least one outcome measure beyond pure self-report
- A comparison condition more rigorous than waitlist-only
- Follow-up assessment at 6 months minimum beyond initial post-placement measurement
- Transparent reporting of attrition and its potential effect on results
- Conflict of interest disclosure from all authors
Most published service dog studies do not meet all of these criteria. That does not mean they have no value. It means I treat them as hypothesis-generating rather than hypothesis-confirming, which changes how I communicate findings to clients, policymakers and the media.
How I Read a Service Dog Study Before Citing It
My actual reading process is sequential and somewhat ruthless. I start with the methodology section, not the abstract. Abstracts are written for press releases. Methodology sections are written for scientists.
I check the sample size first and look immediately for a power calculation. If the sample is under 50 participants and no power calculation is reported, I flag the study as pilot-level evidence regardless of how its authors characterize it.
I check whether the primary outcome was specified before data collection began or whether it appears to have been selected after the fact. Post-hoc primary outcome selection is one of the most reliable markers of inflated significance in clinical research. A pre-registration timestamp on ClinicalTrials.gov solves this problem cleanly.
I look at the attrition tables. Service dog studies frequently lose a substantial percentage of participants between enrollment and follow-up. How those dropouts are handled analytically matters. Intent-to-treat analysis, which includes all participants regardless of whether they completed the protocol, is the conservative and appropriate default. Per-protocol analysis that excludes non-completers can inflate apparent efficacy substantially.
I check the funder disclosures. Research funded exclusively by service dog placement organizations deserves additional scrutiny because the funder has a direct financial and reputational interest in positive outcomes. That does not disqualify the research, but it raises the bar for how I weight conclusions.
The training standards applied to service dogs in the study also matter to me as a CSDT. Service dog teams trained to inconsistent or unverified standards introduce heterogeneity that makes it impossible to know what intervention was actually delivered. If a study enrolled service dogs from 12 different programs with no standardized training verification protocol, the "service dog" variable is not a controlled variable at all. It is a category containing enormous variation, and interpreting results as if the intervention were uniform is a category error.
For trainers interested in deepening their research literacy, the International Association of Canine Professionals and the Council on Certification of Professional Dog Trainers both offer educational resources on evidence-based practice frameworks, though neither yet provides formal research methodology curricula specifically for service dog contexts. That gap in professional education is one I consider significant.
What Weak Evidence Means for Policy
The research literacy problem has direct policy consequences. Congressional testimony about PTSD service dog efficacy regularly cites studies that would not pass a peer methodology review. HUD guidance on assistance animals draws on an evidentiary base that is thinner than most people in housing policy realize. State legislatures considering service dog access legislation frequently hear from advocates presenting preliminary data as settled science.
I am not arguing that service dog programs lack value. My clinical experience across 15 years and my direct observation of handler transformation in training contexts at officialservicedog.com Training Plus give me genuine conviction that well-trained service dog partnerships produce meaningful benefit for many handlers. My argument is that conviction based on clinical experience is a different evidentiary category than conviction based on randomized controlled trial data, and the field should be honest about which category supports which claims.
When advocates overstate the evidence base to win policy arguments, they create fragility. A subsequent rigorous trial that fails to replicate preliminary findings does not just correct the record. It undermines credibility across the entire field, including the legitimate and well-documented benefits that careful research has established.
The service dog community deserves a research base that can withstand scrutiny. Building that base requires researchers with adequate funding, longer timelines, multi-site collaborations and honest engagement with methodological limitation. It requires practitioners and advocates willing to say "the evidence on this specific question is preliminary" when that is the accurate characterization. And it requires policy audiences sophisticated enough to ask, before citing a study, what the methodology actually says.
That last requirement is the one I have the most direct ability to influence. Teaching research literacy within this field is something I consider part of the long-term mission of responsible service dog practice, not an academic indulgence. The clients we serve deserve no less.
