How Anti-Psychiatry Researchers Attack Antidepressants With Faulty Statistics

Generic pill image

Anti-psychiatry is a pseudoscience that downplays or rejects the existence and severity of psychiatric conditions, denies the efficacy of established treatments and demonizes medical doctors. Not all anti-psychiatry activists are committed to all of these three positions, but they are common beliefs within the movement. It is thus very reminiscent of anti-vaccine activists who wrongly think that vaccine-preventable diseases are natural and not very harmful, reject vaccines and demonize pediatricians. In terms of debating tactics, anti-psychiatry activists make use the same standard denialist toolkit: quoting scientists out of context, cherry-picking data, misunderstanding basic science and so on.

A recent paper by Jakobsen and colleagues (2017) claims to have shown that the antidepressant class SSRI has questionably clinical efficacy. It turns out that they base this claim on a piece of highly deceptive statistical trickery: they erect an arbitrary and evidence-free effect size threshold for clinical significance and then reject all treatments that do not fulfill it.

Because the threshold they picked was so large, they would be forced to reject both psychotherapy and a considerable portion of medications used in general medicine as well. The researchers cite National Institute for Health and Care Excellence (NICE) as support for their criteria, but NICE dumped this criteria as flawed around eight years ago. In the end, SSRIs are effective and a useful treatment for depression (but do not work perfectly for everyone) and clinical significance is a spectrum and not a black-and-white issue.

What are antidepressants and how effective are they?

Depression (also called major depression) is a psychiatric condition that involves feelings of emptiness, hopelessness, worthlessness and guilt, loss of interest in things that were previously pleasurable, sleep alterations and other symptoms. It is caused by a complex interaction between biological, psychological and social factors (Passer et al., 2009).

Antidepressants are a class of psychiatric medication that is used to treat depression. They are many different kinds and over time, they have improved efficacy and decreased side-effects as research and development has proceeded. One of the most common forms of antidepressants is selective serotonin reuptake inhibitors (SSRIs), but there are many others as well.

SSRIs have been the subject of hundreds of efficacy and safety studies and approved by regulatory authorities in both the United States and the European Union (e. g. NICE, 2009b). Large-scale randomized control trials (RCTs) show that antidepressants are effective against depression, but they are not miracle medications that works perfectly for everyone (NICE, 2009b; Turner et al. 2008). Research has shown that the best available treatment involves a combination of antidepressants and psychotherapy (and perhaps some exercise) and these often work better together than only getting one of them alone (e. g. Nemeroff et al., 2003).

Depression symptoms are typically measured by validated and reliable rating scales such as the Hamilton Rating Scale for Depression (HRSD) and the impact of a treatment is measured by changes in HRSD score between before and after treatment.

Individual studies can be informative, but the best available evidence can be gotten from systematic reviews that use meta-analytical tools and data from many different studies to boost the accuracy of the conclusions if the meta-analysis is carried out without methodological flaws or substantial bias.

The meta-analytic effect size should be HRSD if all studies use it (Hoarder, 2011). If some studies use different effect sizes, it is suitable to use e. g. the standardized effect size Cohen’s d. Cohen’s d is calculated as ([average change in experimental group] – [average change in control group]) / [pooled standard deviation]. To use this as a meta-analytic effect size means calculating one Cohen’s d effect size per study and then weighting together the results from individual studies by either the sample size or 1 / standard error of the mean of each trial. An effect size of, say, d = 0.5 means that the difference between the average improvement in the two groups is half of a pooled standard deviation in favor of the experimental treatment.

Some anti-psychiatry activists (such as Kirsch and Sapirstein (1998) and Kirsch et al. (2008)) even calculate Cohen’s d wrongly by calculating two faulty “effect sizes” per study (one per group) by taking ([before treatment] – [after treatment]) / [pooled standard deviation {using before and after SDs}], then weighting all treatment “effect sizes” and all control “effect sizes” separately and subtracting them. This has been show to systematically underestimate the efficacy of antidepressants (Horder et al., 2011).

What is the effect size threshold gambit and why is it flawed?

This anti-psychiatry gambit is based on setting up a faulty threshold for clinical significance originally proposed in 2004 by National Institute for Health and Care Excellence (NICE) of d = 0.5. The anti-psychiatry researchers then calculate (sometimes wrongly) the effect size for antidepressants and reject the form of treatment when it do not reach this cut-off as “clinically insignificant.” This gambit did not arise out of nowhere, and precursor gambits can be found with erroneously calculating effect sizes (see next section).

The effect size threshold gambit is a highly deceptive maneuver engineered to load the dice against psychiatric medication from the very start. This is because the tactic has several fatal flaws:

1. Arbitrary: the cut-off is completely arbitrary, since there is no reason to prefer d = 0.5 over some other value (Möller, 2008). This is the same problem that faces the incessant obsession with getting a p value under 0.05.

2. No evidence: there are no scientific studies that have established that d = 0.5 is a generally reliable indicator of clinical significance (Möller, 2008).

3. Black-and-white: it assumes that clinical significance is a black-and-white issue rather than a matter of degree (Hegerl and Mergl, 2010).

4. Ignores context: it assumes that clinical significance can easily be determined from just looking at effect sizes in relatively short-term studies without taking into account the broader scientific context (Hegerl and Mergl, 2010).

5. Would reject psychotherapy: since psychotherapy has an effect size that is below d = 0.5, this gambit would also lead you to reject psychotherapy (Cuijpers et al., 2010; DeRubeis, Siegle and Hollon, 2008). Since most anti-psychiatry activists do not reject psychotherapy (although some do), their position would instantly detonate as self-contradictory.

6. Would reject much of general medicine: the threshold is set so high that it would force us to reject a considerable portion of general medicine treatments. In general, psychiatric medication and general medicine medication are comparable in terms of efficacy (Leucht et al., 2012).

7. NICE rejects the cut-off: the cut-off in question comes from the NICE Clinical guideline [CG23] called “Depression: management of depression in primary and secondary care” from 2004 (NICE, 2004a; NICE 2004c, p. 41). However, this was replaced with the updated NICE Clinical guideline “Depression in adults: recognition and management” from 2009 that does not contain this recommendation (NICE, 2004c, NICE, 2009a). It is also not included in the full recommendations (NICE, 2009b). In the new guidelines, clinical significance is treated as a spectrum, not an arbitrary cut-off. Thus, referencing the NICE guidelines from 2004 means referencing material that is 13 years old and 8 years out-of-date. The Jakobsen et al. (2017) paper uses the old name of NICE, namely National Institute of Clinical Excellence, which suggests that they may not even have visited the website to find out about their clinical guidelines.

How the effect size gambit threshold has evolved

Jakobsen et al. (2017) is not the first time the effect size threshold gambit (and the erroneously calculated effect size precursor) has been deployed. It tends to return about once a decade. The first time it seems to have occurred was in Kirsch and Sapirstein (1998) and refuted by Klein (1998). It returned again in Kirsch et al. (2008) and was refuted by data from Turner et al. (2008) and arguments in Turner and Rosenthal (2008).

Conclusion

When looking at systematic reviews and meta-analyses on the topic of the efficacy of antidepressants, identify what effect size (and error bars) the researchers found, if this effect size was calculated correctly and if any arbitrary standards for clinical significance was used in a deceptive way. Also compare their findings with the broader conclusions of the scientific literature on the subject.

Follow Debunking Denialism on Facebook or Twitter for new updates.

References and further reading:

Charles B. Nemeroff, C. B. et al., (2003). Differential responses to psychotherapy versus pharmacotherapy in patients with chronic forms of major depression and childhood trauma. PNAS. 100 (24) 14293-14296.

Cuijpers P., van Straten A., Bohlmeijer E., Hollon S.D., Andersson G. (2010). The effects of psychotherapy for adult depression are overestimated: a meta-analysis of study quality and effect size. Psychol Med. 40(2):211-23.

DeRubeis, R. J., Siegle G. J. and Hollon, S. D. (2008). Cognitive therapy versus medication for depression: treatment outcomes and neural mechanisms. Nature Reviews Neuroscience 9, 788-796.

Hegerl, U. and Mergl, R. (2010). The clinical significance of antidepressant treatment effects cannot be derived from placebo-verum response differences. Journal of Psychopharmacology. 24(4) 445–448.

Horder J., Matthews P., Waldmann R. (2011). Placebo, prozac and PLoS: significant lessons for psychopharmacology. J Psychopharmacol. 25(10):1277-88.

Jakobsen, J. C. et al. (2017). Selective serotonin reuptake inhibitors versus placebo in patients with major depressive disorder. A systematic review with meta-analysis and Trial Sequential Analysis. BMC Psychiatry. 17:58.

Kirsch, I. and Sapirstein, G. (1998). Listening to Prozac but hearing placebo: A meta-analysis of antidepressant medication. Prevention & Treatment, Vol 1(2).

Kirsch I., Deacon B.J., Huedo-Medina T.B., Scoboria A., Moore T.J., et al. (2008). Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration. PLoS Med 5(2): e45.

Klein, D. F. (1998). Listening to meta-analysis but hearing bias. Prevention & Treatment, Vol 1(2).

Leucht, S., Hierl, S., Kissling, W., Dold, M., Davis, J. M. (2012). Putting the efficacy of psychiatric and general medicine medication into perspective: review of meta-analyses. The British Journal of Psychiatry. 200 (2) 97-106.

Möller, H. J. (2008). Isn’t the efficacy of antidepressants clinically relevant? A critical comment on the results of the metaanalysis by Kirsch et al. 2008. European Archives of Psychiatry and Clinical Neuroscience. 258 (8), 451–455.

NICE. (2004a). Depression: management of depression in primary and secondary care (before removal). Accessed: 2017-02-18.

NICE. (2004b). Depression: management of depression in primary and secondary care (current website). Accessed: 2017-02-18.

NICE. (2004c). Depression: management of depression in primary and secondary care (full guidelines, outdated). Accessed: 2017-02-18.

NICE. (2009a). Depression in adults: recognition and management. Accessed: 2017-02-18.

NICE. (2009b). Depression: the treatment and management of depression in adults (updated edition). Accessed: 2017-02-18.

Passer, M., Smith, R., Holt, N., Bremner, A., Sutherland, E., & Vliek, M. (2009). Psychology: The Science of Mind and Behavior. New York: McGraw-Hill Education.

Turner E. H. , Rosenthal R. (2008). Efficacy of antidepressants: Is not an absolute measure, and it depends on how clinical significance is defined. BMJ 336:516-7.

Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A., & Rosenthal, R. (2008). Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy. New England Journal of Medicine, 358(3), 252-260.

Emil Karlsson

Debunker of pseudoscience.

16 thoughts on “How Anti-Psychiatry Researchers Attack Antidepressants With Faulty Statistics

  • Pingback: How Anti-Psychiatry Researchers Attack Antidepressants With Faulty Statistics | Emil Karlsson

  • February 21, 2017 at 19:26
    Permalink

    Without antidepressants, I would be long dead. I would not necessarily have actively killed myself, but I would certainly have died in an “accident” that I could easily have prevented. Are the meds perfect? No. But I am at least functional with them.

    Reply
  • February 22, 2017 at 00:43
    Permalink

    What other kinds are there? I only know of SNRIs and Wellbutrin.

    Reply
    • February 22, 2017 at 00:48
      Permalink

      There are many classes of antidepressants: SSRIs, SNRIs, NDRIs, atypical antidepressants, tricyclic antidepressants and MAOIs are the most common ones. The two latter classes are older ones that are not really used much today because they have been replaced by safer and more effective ones.

    • March 14, 2017 at 15:37
      Permalink

      I have written extensively about the problem with putting too much focus on p values: Why P-Values and Statistical Significance Are Worthless in Science.

      You can read more about how I think scientific data should be analyzed in The New Statistics
      Why and How
      . Basically, the idea is to focus on effect sizes (without arbitrary cut-offs), confidence intervals, replication and the scientific context.

      You can read more about this kind of approach applied to antidepressants and anti-psychiatry claims in Is not an absolute measure, and it depends on how clinical significance is defined (try doing a Google search with quotation marks around the title and filetype:pdf to find a full text version).

      Your Twitter profile suggest that you are a pseudoscience activist (“yogi” and “psychic”), so I would recommend that you read Uncloaking the Deceptive Tactics Used by Alleged Psychics and What is Scientific Consensus and Why Should You Care?.

    • March 16, 2017 at 00:21
      Permalink

      You forgot “Big-wave legend”. I also charge 60 foot waves at Mavericks. It makes me feel alive.

      I tend to favor a Bayesian statistical approach but that is not a cure-all either.
      This argument you made against this study can be made against any statistics based study. The choice of a clinically significant effect size is arbitrary, as is the choice of an alpha level. We can always argue that the choices are unreasonable. Aren’t you obliged to attack almost all statistics based research as arbitrary just as you did this study?

    • March 16, 2017 at 00:57
      Permalink

      I already do. Check the link to the article mentioned above where I explicitly argue that p values and statistical significance are worthless in science.

      Thankfully, a lot of scientific research focuses on effect sizes, confidence intervals, convergence of evidence from different experiments, the scientific context and replication. That is good science. Something that neither p value abusers nor anti-psychiatry activists generally do.

    • March 16, 2017 at 06:21
      Permalink

      You can’t be making that argument or else you’d be arguing against almost all of the studies used to approve SSRI’s. If you did that, this meta-analysis wouldn’t be necessary because none of these drugs would be approved in the first place. Drugs, especially ones that treat idiopathic illness, are approved almost entirely based on arbitrary alpha levels and effect sizes.

      Fundamentally I see our difference and it may be because of our locations. In the USA medications must be proven to be safe and effective. In other words, the burden of proof is on the producers of drugs to continuously prove their drugs are safe and effective. You are arguing that this meta-analysis does not prove that they are not effective. I am arguing that this meta-analysis shows that existing studies have not proven that these drugs are effective. The manufacturers have not met their burden of proof.

      The Hamilton scale is deeply flawed for statistical analysis. Any reasonable scale of depression would weigh suicide behavior as orders of magnitude more important than eating behavior. Also the measured variables, sleeping, eating, mood, guilt etc are not independent. There is a lot of subjectivity and measurement error as well. Nonetheless we are often forced to make decisions based on flawed data. It adds a layer of uncertainty that we must account for when we set a standard for proof.
      Given that uncertainty, an effect size of 2.25 on the HRSD could mean next to nothing. Therefore we are forced to assume that it does mean next to nothing until proven otherwise. Why? Because the burden of proof is on those trying to demonstrate the drugs are effective. They haven’t met their burden.

    • March 16, 2017 at 14:49
      Permalink

      I am not arguing against all of statistics. I am, in general, arguing against focusing too much on arbitrary cutoffs and have provided an alternative statistical approach that is often used in medicine already. Many studies that look at SSRIs do not naively focused exclusively on p values, but a consideration of effect sizes, confidence intervals and the scientific context.

      For this paper specifically, I am arguing that over-interpreting the results of this meta-analysis is flawed because it relies on an (1) arbitrary, (2) evidence-free and (3) outdated cutoff to make claims about clinical significance, which is a (4) spectrum and not an all-or-nothing thing. Furthermore, this approach cannot be sustained because it would force you to reject a ton of demonstrably effective treatments in not only (5) psychiatry but (6) medicine generally.

      You do not seem to understand my criticism and have chosen to straw man it as merely a complain about the use of a cutoff. Therefore, I think you make arguments that are both deceptive and disingenuous.

      If you think the Hamilton scale is “deeply flawed”, then you cannot appeal to this meta-analysis to argue against the efficacy of antidepressants since it uses the very scale you seem to be rejecting. Your position automatically self-destructs. Furthermore, many other meta-analyses show a much higher improvement than 2.25 points, yet you are determined to focus merely on studies that on the surface appears to support your position, while downplaying or ignoring those that do not.

      Anti-psychiatry callously harms patients by denying their suffering, denying them treatments that work, demonize medical doctors and increases mental illness stigma. There have been many hundreds of studies on the efficacy of psychiatric medications. They have met their burden of proof, both in the scientific literature and among regulatory investigations. Studies discussed in the above post shows that psychiatric treatments are of comparable efficacy to medications in general medicine. Your putrid rage against antidepressants forces you to reject essentially most of modern medicine.

      That is a high price to pay to maintain your ideological beliefs and ultimately a obvious defeater for your position.

    • March 16, 2017 at 18:00
      Permalink

      Thank you! There are millions of us who have been forced off of our meds by insurance companies practicing medicine without a license, and we can testify that without them, we sink a long ways. And some sink so far that they end up committing suicide.

      Now, if we can only start getting enough talk therapy paid for…

    • March 16, 2017 at 18:44
      Permalink

      “Any reasonable scale of depression would weigh suicide behavior as orders of magnitude more important than eating behavior. Also the measured variables, sleeping, eating, mood, guilt etc are not independent. There is a lot of subjectivity and measurement error as well.”

      Wow! That would leave millions of us who are not “suicidal” without any treatment. Why do you think there is so much extreme obesity? Eating to treat untreated depression. People who can’t sleep cause auto collisions. All of these extreme emotions and untreated symptoms lead to repercussions that affect everyone. You reject “subjective” symptoms; carried to an extreme, nobody in pain should get treatment because we can’t prove that they are actually suffering pain.

      I’m sure glad you aren’t in charge!

    • March 21, 2017 at 01:17
      Permalink

      Please propose a delta on the Hamilton scale that would be clinically significant and why that average effect would be meaningful in real world terms to the average patient.

    • March 21, 2017 at 14:28
      Permalink

      Brad, you clearly have not understood the argument. Did you even read the post? Did you read the articles I linked, such as this one?

      The idea is not to come up with some other, equally arbitrary, evidence-free, context-neglecting, black-and-white cutoff for clinical significance.

      The idea is to understand that clinical significance is a spectrum and should take into account a lot more information than just an effect size, such as confidence intervals, numbers needed to treat, side-effects, quality of life, and many other aspects of the scientific context.

      In particular, attacking antidepressants (or any treatment) by using deceptive and dubious tactics like erecting an inappropriate and arbitrary standard and then rejecting everything that falls below it is terribly flawed. It highlights the pseudoscientific nature of anti-psychiatry.

      …and before you repeat your question, yes, I think it is also wrong to claim that statistical non-significance somehow implies equivalence. It does not.

  • Pingback: Mailbag: Anti-Psychiatry Misinformation About Clinical Significance | Debunking Denialism

  • Pingback: Five Reasons Why “Placebo Medicine” is Bullshit | Debunking Denialism

Got anything reasonable to contribute?

%d bloggers like this: