Debunking Denialism

Fighting pseudoscience and quackery with reason and evidence.

Tag Archives: effect size

Mailbag: Anti-Psychiatry Misinformation About Clinical Significance


It is time for another entry in the mailbag series where I answer feedback email from readers and others. If you want to send me a question, comment or any other kind of feedback, please do so using the contact info on the about page.

Anti-psychiatry is a form of pseudoscience that is based on at least three false core beliefs: the denial of the existence or severity of metal illness, the rejection of mainstream treatments for mental illness (including medication and therapy) and the demonization of psychiatrists. There are many different kinds of anti-psychiatry activists. This includes some religious extremists who deny the intimate connections between the mind and the brain, some new age believers who wrongly think that it is just a matter of positive thinking, some alternative medicine proponents who falsely claim that it is due to eating too much acidic foods and so on.

In particular, anti-psychiatry activists spread misinformation and hate about psychiatric medications in much the same way that anti-vaccine and anti-GMO activists fearmonger about vaccines and genetically modified foods. Many anti-psychiatry researchers make obvious statistical errors (by wrongly calculating standardized effect sizes) and create smokescreens about the clinical significance of antidepressants by selecting outdated and arbitrary cutoffs, when clinical significance should be based on the totality of evidence and the scientific context.

Read more of this post

How Anti-Psychiatry Researchers Attack Antidepressants With Faulty Statistics

Generic pill image

Anti-psychiatry is a pseudoscience that downplays or rejects the existence and severity of psychiatric conditions, denies the efficacy of established treatments and demonizes medical doctors. Not all anti-psychiatry activists are committed to all of these three positions, but they are common beliefs within the movement. It is thus very reminiscent of anti-vaccine activists who wrongly think that vaccine-preventable diseases are natural and not very harmful, reject vaccines and demonize pediatricians. In terms of debating tactics, anti-psychiatry activists make use the same standard denialist toolkit: quoting scientists out of context, cherry-picking data, misunderstanding basic science and so on.

A recent paper by Jakobsen and colleagues (2017) claims to have shown that the antidepressant class SSRI has questionably clinical efficacy. It turns out that they base this claim on a piece of highly deceptive statistical trickery: they erect an arbitrary and evidence-free effect size threshold for clinical significance and then reject all treatments that do not fulfill it.

Because the threshold they picked was so large, they would be forced to reject both psychotherapy and a considerable portion of medications used in general medicine as well. The researchers cite National Institute for Health and Care Excellence (NICE) as support for their criteria, but NICE dumped this criteria as flawed around eight years ago. In the end, SSRIs are effective and a useful treatment for depression (but do not work perfectly for everyone) and clinical significance is a spectrum and not a black-and-white issue.

Read more of this post

Apparently, NHST Defenders Could Get Even More Ridiculous

Häggström and NHST, again

Looks like Häggström has decided to re-join the crucial discussion of p values and NHST again, despite refusing to continue after our last encounter because he claimed (without evidence) that my writings were a “self-parody”. This is reminiscent of childish and narcissistic posters on Internet forums who writes a post about how they are leaving the forum because of this or that perceived insult, yet stays around to continue posting. Tiresome and pathetic, especially since he apparently considers a link to the ASA position statement on Twitter to be equivalent to “spewing Twitter bile”. Talk about being easily offended to even the smallest amount of (fair) criticism.

Häggström recently managed to get a paper of his defense of NHST published in the journal Educational and Psychological Measurement. Perhaps “managed” is not quite the correct word, as it is a journal with a very low impact factor of 1.154 and is either in the middle or the bottom half of journals in mathematical psychology (8 out of 13), educational psychology (30 out of 50) and interdisciplinary applications of mathematics (46 out of 99). Perhaps a low quality psychology journal is the only place Häggström can get his rabid defense of NHST published? Well, that and a paper from a conference held in Poland. Not exactly impressive stuff.

Ironically, at the very same day he wrote his blog post about his new “paper”, the prestigious American Statistical Association published a position statement severely criticizing NHST. A previous article on this blog discusses several aspects of it in greater detail. Häggström claims that he agrees with the ASA, yet his paper in EPM attempts to refute NHST critics, both those he call “strongly anti-NHST” and those he labels “weakly anti-NHST”.

Some of the problems with new NHST defense by Häggström

There are too many errors and problems in his paper to recount in this space, but we can look closer at a couple of them:

(1) Häggström presents the NHST situation as a debate, thereby committing the fallacy of false balance.

There is no debate about NHST. The vast majority of papers published discussing NHST are very critical and there have been hundreds and hundreds of such papers published in the past 20 years. Today, there are hardly any papers published defending NHST and those that do defend NHST are few and far between. This shows that Häggström does not have a sufficient command of the NHST literature which is, as we shall see, a recurring theme. It also demonstrates that he most likely deliberately deploys a pseudoscientific debating methods against his opponents called false balance. This is because he, as a self-identified scientific skeptic with substantial experience from the fight against climate change denialists, knows full well that it is socially effective to attempt to undermine the scientific consensus position by portraying it as if there were a debate with two equally legitimate sides. It is not.

Read more of this post

American Statistical Association Seek “Post p < 0.05" Era

American Statistical Association

The edifice of null hypothesis significance testing (NHST) is shaken to its core once more. On March 6th, the American Statistical Association (ASA) revealed to the world that they’d had enough. For the first time in its history since being founded in 1839, they published a position statement and issued recommendations on a statistical issue. This issue was, of course, p values and statistical significance. The position statement came in the form of a paper in one of their journals called American Statistician, together with a press release on the ASA website. The executive director of ASA, Ron Wasserstein, also gave an interview with Alison McCook at the website Retraction Watch and the Nature website has a news item about it.

What was the central point of the position statement?

The press release (p. 1) summed it up quite nicely:

“The p-value was never intended to be a substitute for scientific reasoning,” said Ron Wasserstein, the ASA’s executive director. “Well-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. The ASA statement is intended to steer research into a ‘post p <0.05 era.'"

In other words, ASA acknowledges that p values was not supposed to be the central way to evaluate research results, that basing conclusions on p values and especially if the results are statistically significant or not cannot be considered well-reasoned and finally, that the scientific community should move in a direction that severely de-emphasize p values and statistical significance. Coming from a world-renowned statistical association, this is a stunning indictment of the mindless NHST ritual.

The final paragraph of the preamble to the position statement (p. 6) also points out that this criticism of NHST is not new:

Let’s be clear. Nothing in the ASA statement is new. Statisticians and others have been sounding the alarm about these matters for decades, to little avail. We hoped that a statement from the world’s largest professional association of statisticians would open a fresh discussion and draw renewed and vigorous attention to changing the practice of science with regards to the use of statistical inference.

ASA seems to share the sentiment among many critics of NHST, namely that there are several valid objections to NHST and that these have been raised as very serious problems for many decades with very little progress.

Read more of this post

The Laughable Desperation of NHST proponents

Häggström again

In a previous post, the many insurmountable flaws and problems of null hypothesis statistical significance testing (NHST) were discussed, such as the fact that p values are only indirectly related to the posterior probability, almost all null hypotheses are false and irrelevant, it contributes to black-and-white thinking on research results, p values depends strongly on sample size, and it is unstable with regards to replication. For most realistic research designs, it is essentially a form of Russian roulette. After a mediocre effort, mathematical statistician Olle Häggström failed to defend p values and NHST from this onslaught. Now, he was decided to rejoin the fray with yet another defense of NHST, this time targeting the dance of the p values argument made by Geoff Cumming. Does his rebuttal hold water?

Arguing from rare exceptions does not invalidate a general conclusion

Häggström seems to be under the impression that if he can find rare and complicated counterexamples, he can undermine the entire case for confidence intervals [being generally superior to p values, see clarification here]. (all translations are my own):

To calculate a confidence intervals is akin to calculating p values for all possible parameter values simultaneously, and in more complex contexts (especially when more than one unknown parameter exists) this is often mathematically impossible and/or lead to considerably more complicated and difficult-to-interpret confidence regions than the nicely intervals that are obtained in the video.

This is perhaps due to his background in mathematics where a single counterexample really does disprove a general claim. For instance, the function f(x) = |x| is continuous but not differentiable, thus disproving the claim that continuity implies differentiability. In the case of confidence intervals, on the other hand, the fact that they work in cases with a single parameter is enough to justify their usage. Keeping in mind that the vast number of experiments done in e. g. medicine are probably not complicated estimations of multiple population parameters, but more akin to measuring the effects of a medication compared with placebo, the superiority of confidence intervals over p values for a large portion of experiments stands. Yes, obviously we need more sophisticated statistical tools in more complicated experiments, but that is not a valid argument in the surrounding where they can be calculated and where they do work.

Finally, Häggström continues to refuse the fact that confidence intervals can be dislodged from the framework of NHST. Read more of this post

Why P-Values and Statistical Significance Are Worthless in Science

P-values are scientifically irrelevant

Why should we test improbable and irrelevant null hypotheses with a chronically misunderstood and abused method with little or no scientific value that has several, large detrimental effects even if used correctly (which it rarely is)?

During the past 60+ years, scientific research results have been analyzed with a method called null hypothesis significance testing (NHST) that produce p-values that the results are then judged by. However, it turns out that this is a seriously flawed method. It does not tell us anything about how large the difference was, the precision estimated it or what it all means in the scientific context. It tests false and irrelevant null hypotheses. P-values are only indirectly related to posterior probability via Bayes theorem, what p-value you get for a specific experiment is often determined by chance, the alternative hypotheses might be even more unlikely, it increases the false positive rate in published papers, contributes to publication bias and causes published effect sizes to be overestimated and have low accuracy. It is also a method that most researchers do not understand, neither the basic definitions nor what a specific p-value means.

This article surveys some of these flaws, misunderstandings and abuses and looks at what the alternatives are. It also anticipates some of the objections made by NHST supporters. Finally, it examines a case study consisting on an extremely unproductive discussion with a NHST statistician. Unsurprisingly, this NHST statistician was unable to provide a rationally convincing defense of NHST.

Why NHST is seriously flawed

There are several reasons why NHST is a flawed and irrational technique for analyzing scientific results.

Statistical significance does not tell us what we want to know: A p-value tells us the probability of obtaining at least as extreme results, given the truth of the null hypothesis. However, it tells us nothing about how large the observed difference was, how precisely we have estimated it, or what the difference means in the scientific context.

The vast majority of null hypotheses are false and scientifically irrelevant: It is extremely unlikely that two population parameters would have the exact same value. There are almost always some differences. Therefore, it is not meaningful to test hypotheses we know are almost certainly false. In addition, rejections of the null hypothesis is almost a guarantee if the sample size is large enough. In science, are we really interested in finding if e. g. a medication is better than placebo. We want to know how much better. Therefore, non-nil null hypotheses might be of more interest. Instead of testing if a medication is equal placebo, it can be more important to test if a medication is good enough to be better than placebo in a clinically meaningful way.

Read more of this post

Investigative Skepticism Versus the Mass Media

Relationship violence against men

We are constantly being bombarded with messages from newspapers, television, blogs and social media sites like Facebook and Twitter about alleged facts, recently published scientific studies and government reports. With the knowledge that the mass media often get things wrong when it comes to science, how can you separate the signal from the noise?

One popular approach is to check what many different news organizations has to say about the issue. However, this ignores the fact that many websites just rewrite stories they have seen on other websites. Some even go so far as to just copy/paste press releases. In the fast-paced world we live in, getting the “information” out there as fast as possible has apparently come to triumphs scientific and statistical accuracy. This problem is aggravated in cases when the misinterpretation fits snuggly within a particular political or philosophical worldview (e. g. some conservative groups and climate change denialism). Another approach is limiting yourself to only reading news from websites that fit with your own positions. However, this leaves you open to considerable bias. The classic example is anti-immigration race trolls who only read “alternative media”, which tend to twist a lot of the news item they publish to fit with their agenda. A third approach is a combination of the two above: only believe things that news organizations with radically different stances agree on. The downside to this is that it almost never happens with issues that are scientifically uncontroversial, but controversial in the eye of the public (climate change being the obvious example).

This post will outline an explicit investigative method based on scientific skepticism designed to find out the truth behind popular stories on science. To illustrate it, a case study of mass media treatment of two new Swedish studies on relationship violence against men will described Read more of this post

Untangling Steven Novella on Effect Sizes and NHST

NHST and Effect Sizes

Steven Novella is a neurologist, assistant professor, the founder and executive editor of the Science-Based Medicine blog, host of the podcast The Skeptics’ Guide to the Universe, president of the New England Skeptical Society and involved in such skeptical organizations as the James Randi Educational Foundation and Committee for Skeptical Inquiry. In addition, he is one of the scientific skeptics that has influenced me the most and I have benefited greatly from his writings on everything from criticisms of acupuncture to the debunking of anti-psychiatry.

In a previous post, I discussed the very impressive paper where Colquhoun and Novella convincingly showed that acupuncture probably was not better than placebo, and even if it was, the effect was probably clinically negligible. However, in the Science-Based Medicine blog post talking about this paper, Novella expanded on a statistical argument about to what extent scientists could provide evidence for an effect size of zero for a given treatment. Although no single claim Novella made was wrong in isolation, the overall context in which some of them were stated made the line of reasoning a little bit confusing Read more of this post

%d bloggers like this: