Scientists have over the last couple of years become more and more aware of the problem of failing to reproduce scientific findings. Part of the answer is the normal workings of science, in which flawed results get weeded out by the process of self-correction. Other aspects of the problem boils down to the publish-or-perish culture of scientific research, the reluctance to publish negative results and misuse and abuse of statistics.
For far too long, statistical treatment of scientific data has been overly focused on statistical testing and p values. In reality, a p value only tells you the probability of obtaining at least as extreme data under the assumption of the null hypothesis. It does not tell you anything about how large the effect you measured was, how precisely it was estimated, what it means in the scientific context or if it can be replicated by independent researchers. But the tantalizing allure of the p value has spellbound generations of scientists who wanted to take shortcuts to good scientific research.
This has led to extremely damaging consequences in science, including black-and-white thinking, overestimating effect sizes, the conflation of statistical and practical significance and a severely impaired research progress were as much as half of all research cannot be replicated. A useful way out of this is to remove the strong emphasis on mindless statistical testing and focus on things that researchers and policymakers are actually interested in.
Effect sizes tells you how large the size of the difference was, confidence intervals tells you how precise your estimate is and putting these two into the scientific context will tell you what the difference and its precision means. With the help of meta-analysis and replication, scientists can find out if their findings was really true or affected by too much bias and error. Journals are now becoming much harsher towards mindless statistics and have published statistics guidelines and added peer-reviewers with an expertise in statistics.
The New Statistics: Why and How is a position paper by the Psychological Sciences journal (a journal maintained by the Association for Psychological Science) written by Geoff Cumming (emeritus professor at La Trobe University) about how statistical treatment of data should be improved and published in 2014 by the same journal.
Cumming advocates letting the mindless and inconsistent procedure of null hypothesis significance testing (NHST) fall to the wayside and replace it with more meaningful and robust methods of analyzing scientific data, such as effect and interval estimation, as well as meta-analysis and replication.
The paper surveys many of the flaws and problems with NHST, proposed a list of 25 guidelines for improving empirical sciences and their treatment of research data, and even empirical evidence why confidence intervals are better than p values in the interpretation of research results.
Even better, Cumming outline an eight-step process for how to use estimation methods in research and how to interpret effect sizes and confidence intervals. It also offers realistic examples of how to use them in cases of two independent groups, correlations, proportions, and more complex research designs. Finally, it covers how to use meta-analysis methods, how to practice meta-analytical thinking and how to use forest plots to illustrate the results.
This paper is not just for people who want to publish in this journal. On the contrary, it offers a highly accessible outline of new approaches to the statistical analysis of research results and may make readers more wary of statistics abuse and boost statistical cognition. In particular, it provides the tools to think critically about p values and their role in the reporting of scientific results and what kind of conclusions one can (and cannot) draw from any given result.