Mathematical statistics and probability is hard. It often involves what, at a first glance, involves complicated calculations and the sheer volume of data coming out of some studies can often be hard to interpret, even if you know all of the mathematics behind it. Although it is important to understand the math, it is equally important (or perhaps even more important) to understand what the results mean and don’t mean. It is easy to get dazzled by fancy mathematics or over-interpret results to mean something they really do not. Therefore, a basic understanding of statistical fallacies should be a part of every scientific skeptics toolbox or baloney detection kit.

Here is a list of the most common statistical fallacies, what they are and how to combat them.

**1. Confusing correlation with causation**

A correlation is when two variables vary together, whereas causation occurs when one factor causes the other. It may be tempting to think that the former implies the latter, but that is hardly ever the case. For instance, ice cream sales may increase in the summer and decrease in the winter. The same may be true for drowning accidents. Does this mean we can draw the conclusion that drowning accidents causing ice cream sales? Does this mean that people have become so selfish and morally vile that they prefer to buy ice cream and watching people drown than trying to save them!? Fortunately, not really. Just because two variables vary together does not mean that one caused the other. It might be that the other caused the first, that they both cause each other or that a third factor causes both. In the case of ice cream sales and drowning accidents, a third factor that probably explain the correlation is season. In the summer, more people eat ice cream and go bathing, but fewer to these things in the winter. Confusing correlation with causation is widespread in many areas of pseudoscience, such as the anti-vaccination movement; one of their claims is that as the number of vaccines given have increased, so has the rates of cancer. This shows that the two factors correlate, not that vaccines caused cancer (in fact, the vaccine against HPV and Hepatitis B can prevent cancers) is a correlation, not a causation. A more likely factor is better healthcare as a third factor; better healthcare has meant more vaccines, but also increased lifespan, which is associated with an increase in the risk of cancer.

**2. Post hoc**

Post hoc and denotes the fallacy of thinking that A causes B just because B follows A in time. This fallacy, like the fallacy of confusing correlation with causation, is understandable from an evolutionary perspective. Those that where too skeptical of attributing an upset stomach to poisonous berries where less likely to reproduce. However, this kind of instinct-based reasoning can no longer be thought of as justified in our modern society. A clear example of this fallacy is thinking that because dawn occurs after the rooster crows, therefore the rooster caused dawn to occur. This fallacy is pretty much a core feature of anti-vaccination rhetoric.

**3. Thinking that the average says anything about the spread**

This fallacy can often by found in racist or anti-immigration rhetoric. It is the fallacy of claiming that the average (say, intelligence or prevalence of crime) is different a particular group A compared with another group B, therefore, it is reasonable to treat individuals in group A as if they, say, had lower intelligence or higher prevalence of crime. This is an erroneous argument, because the average says nothing about the spread, that is, how different individuals in the group are distributed. When it comes to IQ scores, a group A in which individuals all have precisely 120 IQ will have a higher average IQ than a group B in which half of the individuals have 95 IQ and the other half 140 IQ (0.5*95+0.5*140 = 117.5), yet clearly half of the individuals in group B have a much higher IQ than individuals in group A, despite the fact that group B containing individuals with a lower IQ than individuals in group A. This could feed into a stereotype of thinking that individuals in group B have less IQ than those in A. These groups and figures are just made up for the purpose of a simple explanation and does not refer to any actually existing group.

**4. Confusing a priori probability of a specific event with a fortiori probability of a specific class of events**

This sounds really difficult, but it is the most used statistical fallacy by creationists. It revolves around asserting that this or that biological structure has a very low probability of arising, because, say, the mutations needed are so improbable to occur in the right sequence, so therefore, evolution cannot have produced them. This is the a priori probability. This may seem like a plausible argument to laypeople, but imagine the following analogy. Have you ever played bridge? The number of possible bridge hands is 635013559600, so the a priori probability for a specific bridge hand is 1/635013559600. Imagine the silliness in getting a particular bridge hand, then exclaim that you could not possibly have gotten that particular hand that you just got since the probability is astronomically low. This is exactly the same error as these kinds of creationists are doing. In the same way that the question “how likely is the given hand I just got?” does not make sense in this context, but instead the relevant question is “how likely is that I get any bridge hand” he question should not be “what is the a priori probability that this sequence of mutations occurred”, but rather “what is the probability of any mutation giving a viable organism?”

**5. Confusing statistical significance with clinical significance**

The term “statistical significance” sounds complicated, but if a result is statistically significant, it just means that there is a low probability that the results (or something more extreme) would have occured if the null hypothesis was true. However, this says nothing about the probability of the null hypothesis given the evidence. It also does not mean that the results are “significant” in a clinical or scientific context. That is, just because the difference between, say, a placebo group and a group given a treatment is statistically significant, does not mean that the treatment is highly effective. It is entirely possible that the difference is real (in the sense of not just being caused by chance), but that the difference is not really that large to merit thinking of the treatment as highly successful in a clinical setting.

There are plenty of other statistical fallacies and ways to misuse and abuse statistics, such as data dredging, unrepresentative sample, too strong conclusions about the population from a sample and pseudoreplication. Perhaps future blog posts will treat some of these in more detail.

Follow **Debunking Denialism** on Facebook or Twitter for new updates.

Categories: Debunking Misuse of Statistics, Science Explained, Skepticism

I meant to leave a comment on your one-year anniversary, but this post finally prompted me to take the time. Great job on the blog. As a blogger who gets even less comments and a lot less traffic, it gets hard to keep the dauber up sometimes. Keep up the good work!

Thanks. Although my Alexa ranking is a bit higher than yours, you do have about seven times the post, more regular posting, about 6 times the amount of websites that link in and about ten times the total amount of visitors. Don’t sell yourself short.

Thank you back. Although I’ve found that posting photos with good filenames and alt titles is a secret to getting lots of hits through Google Images. Not readers though. And I comment with my blog url a lot. But thanks.

Yeah, when I comment (with blog URL) on a new post on, say, Pharyngula with something clever, I sometimes get a lot of hits.

Very well said, and very well written. “Ad hoc ergo propter hoc” and “post hoc, ergo propter hoc” are indeed very common. This is indeed a great post in both logic and statistics.

This third fallacy is also called “torturing the numbers”, and is really common in economics: If gov. X has had a high inflation rate with the avrage of 5%, but the rate has been downward from 7 to 3, and gov. Y has had a lower avrage of 4, but the rate has been upward from 3 to 5, then gov X is better in terms of controlling the inflation, but in average seems that Y has done the better job.

Surely, the Latin term for confusing correlation with causation is “cum hoc, ergo propter hoc” (“with this, therefore because of this”)? Sorry, I just love to nitpick :).

Also, thanks for a more practical example for the third fallacy. I didn’t think of it.

Ha! You’re right, I just got them from memory.

No problem, we had to deal with this sort of thing a lot, and this was the place they tend to really hurt. Because they are easily manipulative for ordinary people.