Looks like Häggström has decided to re-join the crucial discussion of p values and NHST again, despite refusing to continue after our last encounter because he claimed (without evidence) that my writings were a “self-parody”. This is reminiscent of childish and narcissistic posters on Internet forums who writes a post about how they are leaving the forum because of this or that perceived insult, yet stays around to continue posting. Tiresome and pathetic, especially since he apparently considers a link to the ASA position statement on Twitter to be equivalent to “spewing Twitter bile”. Talk about being easily offended to even the smallest amount of (fair) criticism.
Häggström recently managed to get a paper of his defense of NHST published in the journal Educational and Psychological Measurement. Perhaps “managed” is not quite the correct word, as it is a journal with a very low impact factor of 1.154 and is either in the middle or the bottom half of journals in mathematical psychology (8 out of 13), educational psychology (30 out of 50) and interdisciplinary applications of mathematics (46 out of 99). Perhaps a low quality psychology journal is the only place Häggström can get his rabid defense of NHST published? Well, that and a paper from a conference held in Poland. Not exactly impressive stuff.
Ironically, at the very same day he wrote his blog post about his new “paper”, the prestigious American Statistical Association published a position statement severely criticizing NHST. A previous article on this blog discusses several aspects of it in greater detail. Häggström claims that he agrees with the ASA, yet his paper in EPM attempts to refute NHST critics, both those he call “strongly anti-NHST” and those he labels “weakly anti-NHST”.
Some of the problems with new NHST defense by Häggström
There are too many errors and problems in his paper to recount in this space, but we can look closer at a couple of them:
(1) Häggström presents the NHST situation as a debate, thereby committing the fallacy of false balance.
There is no debate about NHST. The vast majority of papers published discussing NHST are very critical and there have been hundreds and hundreds of such papers published in the past 20 years. Today, there are hardly any papers published defending NHST and those that do defend NHST are few and far between. This shows that Häggström does not have a sufficient command of the NHST literature which is, as we shall see, a recurring theme. It also demonstrates that he most likely deliberately deploys a pseudoscientific debating methods against his opponents called false balance. This is because he, as a self-identified scientific skeptic with substantial experience from the fight against climate change denialists, knows full well that it is socially effective to attempt to undermine the scientific consensus position by portraying it as if there were a debate with two equally legitimate sides. It is not.
(2) Häggström engages in blatant cherry picking
Häggström, like before, takes a critical stance towards the strong anti-NHST decision taken by the journal Basic and Applied Social Psychology (BASP). In a previous blog post dissected here, he called this “intellectual suicide”. Yet he does not bother to examine similar cases where journals have strongly and explicitly reduced the usage of NHST in favor of something else to see if these journals did, in fact, commit “intellectual suicide”. Two such examples are Epidemiology (1998) and Psychological Science (2014). Both of these journals have a higher impact factor (6.196 and 4.940, respectively) than BASP (1.168). So either Häggström was completely ignorant about this, or selectively chose his example of BASP because they did not provide a clear alternative like e. g. Psychological Science did.
(3) Häggström mischaracterizes the position of Cumming and Ziliak & McCloskey
Häggström labels both Cumming and Ziliak & McCloskey “weakly” anti-NHST. This is highly bizarre, as Cumming advocates the near complete abolition of NHST (apart from perhaps in very preliminary exploratory research) and the latter explicitly calls it a cult that destroy lives (this was, in fact, part of their book title). This betrays a profound ignorance of the NHST literature, and quite surprising as it seems that Häggström has at least read Ziliak & McCloskey. Of course, it is possible to read without understanding (which I suspect is the case).
Let us look closer at the Cumming case because it is the one that I am mostly familiar with. Häggström cites several items from Cumming: a 2009 Youtube video, his 2012 book “Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis”, a blog post at the Conversation also from 2012, and another blog post from American Psychological Society from 2014. This is very odd and reveals a terribly sloppy scholarship by Häggström. Why? Häggström did not cite a single paper published by Cumming. Not a single one! Not even the landmark 2014 paper he wrote establishing the new statistics guidelines for the American Psychological Society journal Psychological Science.
Because Häggström did not do a sufficient survey of the writings of Cumming, he is also left with a wrong view of what Cumming argues. Häggström falsely claim that Cumming is only “weakly” anti-NHST, by claiming that since Cumming defends confidence intervals (CIs), he cannot be all that against NHST. This is completely wrong on many counts. First, had Häggström checked the 2014 paper in question, he would have read the following:
Simply do not use NHST
I want to be clear about what I am not advocating. I am not suggesting that we simply report CIs alongside NHST. That would most likely lead to the situation currently found in medicine—CIs are reported but not routinely interpreted, and conclusions are largely based on p values. Nor am I suggesting that we report CIs, but not NHST, and then base interpretation primarily on whether or not CIs include zero. That would merely be NHST by stealth. These two approaches would amount to NHST business as usual, perpetuation of all the old problems, and no extra impetus toward research integrity and a cumulative quantitative discipline.
Instead, I recommend following as much as possible all the steps in the eight-step strategy. I include “whenever possible” in my recommendations that we avoid NHST, to cover any cases in which it is not possible to calculate a relevant CI; I expect such cases to be rare, and to become rarer. I strongly suggest that the best plan is simply to go cold turkey, omit any mention of NHST, and focus on finding words to give a meaningful interpretation of the ES estimates and CIs that give the best answers to your research questions. To be clear, I conclude from the arguments and evidence I have reviewed that best research practice is not to use NHST at all; we should strive to adopt best practice, and therefore should simply avoid NHST and use better techniques.
I see now other way to interpret “Simply do not use NHST”, “I strongly suggest that the best plan is simply to go cold turkey, omit any mention of NHST” and “I conclude from the arguments and evidence I have reviewed that best research practice is not to use NHST at all” than that Cumming is very strongly anti-NHST.
Secondly, Häggström claims that Cumming suggests “that reporting p-values and statistical significance should be abolished in favor of reporting confidence intervals”. But this is not accurate. Instead, Cumming wants to abolish NHST in favor of reporting and interpreting effect sizes, CIs, interpret those two in the scientific context together with replication and meta-analysis.
Thirdly, Häggström claims that Cumming is not all that against NHST because he defends CIs. According to Häggström, CIs are intrinsically tied to p values and it is impossible to use CIs in any other way. This is completely wrong. The point with CIs in a non-NHST setting is to use it to get a range of plausible* values for the population parameter in order to prevent a p value tunnel vision and a black-and-white view of research results that is inherent in NHST. It is not about doing statistical tests with the use of CIs or thinking about CIs in a frequentist way of 95% of all CIs generated overlapping the true population parameter (see quote above and the paper it is from). There are other ways of interpreting CIs besides those two mentioned. This is something that Häggström would have known if he had, in fact, read his own reference to the 2012 book by Cumming. It is right there, on the inside of the front cover of the book! The helpful table can also be found in Table 5.1 on page 129 and this has also been discussed on this website before.
In fact, there is an entire section devoted to justifying the use of CIs in a non-NHST context. Häggström does not for a moment interact with these arguments, whether this is due to his own ignorance of the existence of these arguments or his own realization that he cannot (or something else) is unknown.
One powerful indicator of a substandard scholar is a failure to properly read his or her references, and make claims that are directly refuted in those references. It is hard to think of another word than embarrassing.
There is an even larger problem, and that is that if Häggström was consistent in his claim “confidence intervals are derived from p-value calculations” and that this means that it is impossible to use CIs without using NHST, he would have to claim that Bayesian statistics is a form of frequentist statistics, since Bayes theorem includes the term P(D|H). Clearly, it is wrong to claim that Bayesian statistics is a form of frequentism, when those two frameworks are fundamentally different. In much the same way, it is wrong to claim that The New Statistics as advocated by Cumming is really just NHST in disguise because it uses CIs for a completely different purpose than NHST.
Latest response by Häggström
In his response to my post on the ASA position paper, Häggström also repeats the same errors that have already been thoroughly debunked before on this website. In particular, and as mentioned in the above article, he cited the fallacious claim that a very low p value either means that the null hypothesis is false or that something unlikely happened. I have disproved this claim multiple times with multiple counterexamples, yet Häggström continues to entertain this delusion.
He proposed two objections, both of them crucially flawed:
(i.) He is, in fact, not citing Fisher as an authority, because what Fisher claims is as simple as 7+5=12 and besides, he only cited Fisher because it felt like elegant phrasing.
But NHST is not as simple as 7+5=12 and NHST is certainly not the same as Fisher’s framework. Rather, NHST is as convoluted and erroneous as claiming that 7 + fish = the color green. This is because NHST is a twisted hybrid bastard between the Fisher framework and the Neyman-Pearson framework, and as these two contradict each other on several crucial places:
– what the research goals are,
– whether an alternative hypothesis is required,
– what hypothesis is to be tested,
– the nature of p value cut-off,
– what the statistic of interest is,
– if error probabilities are relevant,
– what do if results fall in the critical region and their interpretations.
The resulting specimen cannot generally be considered valid. There are also problems with each of the two frameworks independently (since the originators were fierce critics of each other) and even more problems that stem from their accurate and inaccurate applications, which further reinforces this conclusion.
How can a mathematical statistician can defend such a monstrosity?
(ii.) both of my counterexamples are wrong, because both of them entail that the null hypothesis is false.
Häggström argues that even tiny, tiny difference between means (such as, say, 10-30) are still non-zero and therefore the null hypothesis is false. But no person in their right mind would say that this non-negligibly different from zero. So here Häggström is being overly anal and I am not at all impressed. For all intents and purposes, the null hypothesis is true in this case.
Häggström makes a bare assertion that my second counterexample, namely faulty underlying assumptions, entails that the null hypothesis is false. But this is not at all the case. It is possible for two population parameters to be identical, yet the obtained data does not fulfill the underlying assumption for a given statistical test. After all, the world does not change because humans use statistical test A instead of B! Also note that this counterexample is also given by the ASA, so here Häggström goes against the statistical consensus.
* note that it says plausible and not probable. Had it said probable it would have performed a fallacy whereby a 95% CI is falsely interpreted as the range where the true population parameter lies with 95% probability. This is wrong as a single CI either overlaps the population parameter (100% probability that it does) or not (0% probability that it does). The 95% comes from the fact that 95% of all CIs generated from repeating a sampling procedure will overlap the true population parameter.