**Note:** Phil Plait has now admitted and corrected his statistical mistakes, which is very admirable and a sign of a genuine scientific skeptic. Read more about it here (note added 15:26 GMT +1 2013-03-20).

I dread writing this post. That is because I have great intellectual admiration for Phil Plait. He is a great champion of reason and a powerful opponent of all things pseudoscience. From climate change denialists to moon landing conspiracies, Plait swings his katana of rigorous scientific skepticism and destroys all of it. However, errors made by fellow skeptic should not be ignored.

Recently, Plait wrote a wonderful debunking piece on how, despite denialists insisting otherwise, global warming has not stopped. This notion has become somewhat of a staple food for climate change denialists, kind of like how “evolution is just a theory” is for creationists. Skeptical Science, one of the best sites for debunking climates myths used by denialists, discusses it here and also has a well-written piece on the same topic as Plait wrote abut.

Let’s get one thing straight: Phil Plait is absolutely correct about the science. Global warming has not stopped, David Rose is completely wrong and Plait explains why in a very accurate and persuasive piece of writing. Here, as a summary, are the major flaws committed by David Rose (as reported by Plait and Skeptical Science):

- Rose picks a graph showing air temperatures, which is a somewhat misleading indication of global warming as the heat is rapidly absorbed into the ocean.
- Rose cherry-picks a short time interval, which is inappropriate as it only gives you an estimate of short-term fluctuations in temperature and not long-term trends.
- Rose is ignoring the effects of La Niña.
- The observed data is still within the 90% confidence intervals as reported by IPCC/MET Office/Ed Hawkins
- There is no scientific controversy about the question “Does humans contribute substantially to the current global warming trend?” The evidence has established beyond all reasonable doubt that the answer is yes.

I have absolutely no disagreement with this points. They are solid refutations of what Rose claimed.

With this in mind, let us examine the statistical errors committed by Plait in his otherwise excellent article. They do not undermine his refutation of Rose or his defense of good climate science, but they are still common statistical errors that should not have been made.

**Those error bars are 90% confidence intervals, not 95%**

Plait states that the error bar used by Rose is a 95% confidence interval. This is wrong. It is technically an error committed by Rose himself because. as we shall see, he misunderstood the information in the original graph produced by Ed Hawkins. However, Plait foolishly placed unfounded trusted in Rose on the nature of those confidence intervals without checking the details of the source graph. Here is what dana1981, John Russell and John Mason writes on Skeptical Science in the article David Rose Hides the Rise in Global Warming:

Despite being on the lower end of that range, global surface temperatures are nevertheless within the 90% (5–95%) confidence interval of model runs, and therefore have not been overestimated. Note that we expect the data to fall outside the 90% range 10% of the time, so even if the observational temperatures were outside that envelope (which they are not), it wouldn’t necessarily mean that global surface warming has been overestimated.

Ed Hawkins explains it like this in his Updated comparison of simulations and observations (hyperlinks omitted):

[UPDATE (17/03/13): David Rose has written an article in the Mail on Sunday which, by eye, seems to use the top left panel from the figure below, but without mention of its original source. In the article David Rose suggests that this figure proves that the forecasts are wrong. This is incorrect – the last decade is interesting and I have discussed these issues previously (as have many others) and I have even co-authored a published article about the most sensitive simulations being less likely. David also incorrectly suggests that the shaded ranges shown are 75% and 95% certainty. As labelled below, they are actually the 25-75% and 5-95% ranges, so 50% and 90% certainty respectively.]

Hawkins also makes a statistical error in using the term “certainty” instead of “confidence” (more about this later). However, Hawkins does accurately state that it is a 90% CI.

**Confusing confidence level with confidence interval**

The statistical concept that is relevant for the graphs in question is *confidence intervals*. An easy way to understand this question is this: imagine sampling a population an infinite number of times and calculating a mean. A 95% confidence interval is an interval that will capture the true population mean 95% of the times. However, Plait keeps talking about confidence level. This is not an interval. It is just a measure of the confidence. A 95% confidence interval has a 95% confidence to include the true population parameters. This does not mean that we are 95% certain that our particular 95% confidence interval will include the population parameter, only that if we take a very large number of samples and calculate a mean, 95% of those confidence intervals we calculated will include the true population parameter.

**Misunderstanding the nature of confidence (intervals and levels)**

I have already alluded to a common misconception about confidence intervals above: a 95% confidence interval is not the probability that our particular confidence intervals includes the population parameter (that probability is either 0% or 100%, either it does include the population parameter or it does not). It is about the proportion of confidence intervals that can possibly be constructed (by taking samples from the population) that include the true population parameter. To put it simply, confidence intervals are frequentist, not Bayesian.

**Misunderstanding statistical significance tests**

Plait also makes a serious statistical error when claiming the following:

Something at the 95 percent level means there’s only a 5 percent chance the numbers are due to random noise, for example.

Let us unpack this statement. First of all, there is a relationship between confidence intervals and p-values that we must look at. If a calculated mean has a 95% confidence interval that does not overlap a specific null hypothesis, then we can say that the difference between the mean and the null hypothesis is statistically significant (this becomes more complex if we are comparing two groups). This is the gist of what Plait is trying to say, but he also misunderstands what statistical significance means.

He seems to be saying that statistical significance with p < 0.05 (i.e. null hypothesis outside the 95% confidence interval) means that there is a 5% chance that the observed data are due to random noise. However, this is not accurate. First of all, a p-value is a conditional probability: it is the probability that the observed data, or more extreme data, would have been observed given that the null hypothesis is true.

Second, it has nothing to do with the probability that the results are due to chance. This is because the sample statistic for a significance test is calculated under the assumption that all deviations from the null hypothesis is due to chance. Rex B. Kline explains (Kline, 2004, pp. 63-64):

A p value is the probability that the result is a result of sampling error; thus, p < .05 says that there is less than a 5% likelihood that the results happened by chance. This false belief is the odds-against-chance fantasy (Carver, 1978). It is wrong because p values are computed under the assumption that sampling error is what causes sample statistics to depart from the null hypothesis. That is, the likelihood of sampling error is already taken to be 1.00 when a statistical test is conducted. It is thus illogical to view p values as measuring the probability of a sampling error. This fantasy together with others listed later may explain the related fallacy that statistical tests sort results into two categories, those a result of chance (H

_{0}is not rejected) and others a result of “real” effects (H_{0}is rejected). Unfortunately, statistical tests applied in individual studies cannot make this distinction. This is because any decision based on NHST outcomes may be wrong (i.e. a Type I or Type II error).

**How should we think about confidence intervals?**

Geoff Cumming (2012), a strong defender of the use of confidence intervals, points to the following six ways of interpreting a 95% confidence interval. I have simplified them so readers not so familiar with statistics can understand the gist:

1. A 95% confidence interval is an interval that will capture the true population mean 95% of the times.

2. A 95% confidence interval is a range of plausible (using the term probable here would be wrong as we saw above) values for the population parameter.

3. A 95% confidence interval is a margin-of-error that gives a likely maximum error for the estimation, even though larger errors are possible.

4. The values closer to the mean is relatively more plausible than the values nearer the ends of the confidence intervals.

5. The relationship between confidence intervals and statistical significance (outlined earlier).

6. On average, a 95% confidence interval is an 83% prediction interval.

**The intellectual responsibility of a public intellectual**

When it comes to the science, Phil Plait is absolutely right and David Rose is completely wrong. Yet, Phil Plait made a couple of common statistical errors in his article. Most of them were a result of his own lack of detailed familiarity with the topic, but one occurred because he trusted some of the information provided by David Rose.

Spreading incorrect information about basic statistics is unfortunate and it is the responsibility of a scientist, science educator and public intellectual like Phil Plait to avoid making them.

**References and further reading**

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.

Cumming, G. (2012). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. New York: Routledge

Categories: Debunking Misuse of Statistics, Skepticism

Phil Plait has now corrected his mistakes, although I was not the causal factor in making him correct them: Plait states that he was corrected by David Cade at Oregon State, who apparently made more or less similar criticisms as I did in the above post. Here is what Plait wrote at the bottom of his article:

Here is what the section now states:

Plait now (1) correctly uses the term “confidence interval” instead of “confidence level”, (2) correctly removed the section about “5% chance the numbers are due to random noise” and (3) no longer imply that confidence intervals are about the probability of overlapping the true population parameter.

The term “measurements” here are a bit ambiguous though, as it is not clear if he is referring to future measurements (correct) or the measurements in the graph. Remember, a 95% confidence interval means that 95% of confidence intervals that can be generated by taking a sample will overlap the true population parameter. It says nothing about the probability that this specific 95% confidence interval overlaps the population parameter: that probability is either 100% (do overlap) or 0% (do not overlap).

It is fine to simplify complex statistical topics, but somewhere in that gray area is where oversimplified morphs into wrong.

At any rate, Phil Plait acknowledged and corrected his mistakes very fast. That is a very admirable trait for a scientific skeptic. Consider this issue resolved.