Why should we test improbable and irrelevant null hypotheses with a chronically misunderstood and abused method with little or no scientific value that has several, large detrimental effects even if used correctly (which it rarely is)?
During the past 60+ years, scientific research results have been analyzed with a method called null hypothesis significance testing (NHST) that produce p-values that the results are then judged by. However, it turns out that this is a seriously flawed method. It does not tell us anything about how large the difference was, the precision estimated it or what it all means in the scientific context. It tests false and irrelevant null hypotheses. P-values are only indirectly related to posterior probability via Bayes theorem, what p-value you get for a specific experiment is often determined by chance, the alternative hypotheses might be even more unlikely, it increases the false positive rate in published papers, contributes to publication bias and causes published effect sizes to be overestimated and have low accuracy. It is also a method that most researchers do not understand, neither the basic definitions nor what a specific p-value means.
This article surveys some of these flaws, misunderstandings and abuses and looks at what the alternatives are. It also anticipates some of the objections made by NHST supporters. Finally, it examines a case study consisting on an extremely unproductive discussion with a NHST statistician. Unsurprisingly, this NHST statistician was unable to provide a rationally convincing defense of NHST.
Why NHST is seriously flawed
There are several reasons why NHST is a flawed and irrational technique for analyzing scientific results.
Statistical significance does not tell us what we want to know: A p-value tells us the probability of obtaining at least as extreme results, given the truth of the null hypothesis. However, it tells us nothing about how large the observed difference was, how precisely we have estimated it, or what the difference means in the scientific context.
The vast majority of null hypotheses are false and scientifically irrelevant: It is extremely unlikely that two population parameters would have the exact same value. There are almost always some differences. Therefore, it is not meaningful to test hypotheses we know are almost certainly false. In addition, rejections of the null hypothesis is almost a guarantee if the sample size is large enough. In science, are we really interested in finding if e. g. a medication is better than placebo. We want to know how much better. Therefore, non-nil null hypotheses might be of more interest. Instead of testing if a medication is equal placebo, it can be more important to test if a medication is good enough to be better than placebo in a clinically meaningful way.
Read more of this post