Reformers say psychologists should change how they report their results, but does anyone understand the alternative?

The rectangular bars indicate sample
means and the red lines represent the
confidence intervals surrounding them.
Image: Audriusa/Wikipedia

Psychological science is undergoing a process of soul-searching and self-improvement. The reasons vary but include failed replications of high-profile findings, evidence of bias in what gets published, and surveys suggestive of questionable research practices.

Among the proposed solutions is that psychologists should change the way they report their findings. Traditionally, most research papers report a “p value” that indicates whether the results are statistically significant or not. Critics say this approach encourages “p hacking” – tweaking the experimental protocol until the results reach the magic threshold of significance.

An alternative approach proposed by some reformers is to report “confidence intervals” – these indicate an upper and lower range within which the true mean score might lie (the “true mean” here refers to the average score or measure of a population, as opposed to the average score of your particular sample. Imagine sampling the height of 100 men to try to estimate the “true” average height of men in the country).

Confidence intervals are less prone to misinterpretation, say advocates, and they help avoid dichotomous thinking – the idea that a result is either significant or it isn’t. The American Psychological Association strongly endorses the use of confidence intervals.

Now a group of psychologists in The Netherlands has tested whether confidence intervals really are as well understood as their supporters claim. Rink Hoekstra and his colleagues surveyed 442 first-year psychology students who had yet to complete any statistics classes; 34 masters students; and 120 psychology researchers including doctoral students and lecturers.

All the participants were presented with a basic research premise:

Professor Bumbledorf conducts an experiment, analyses the data, and reports: “The 95% confidence interval for the mean ranges from 0.1 to 0.4!”

The participants were then presented with six statements that follow from Bumbledorf’s result. In each case they had to indicate whether the statement was true or false. They were told that it was possible all the statements were true, all were false, or that there was a mix of true and false. Here are three of them:

  • The probability that the true mean is greater than 0 is at least 95%
  • There is a 95 per cent probability that the true mean lies between 0.1 and 0.4
  • If we were to repeat the experiment over and over, then 95% of the time the true mean falls between 0.1 and 0.4

In actual fact all six of the statements were false, and the alarming result is that many were incorrectly endorsed, not just by the students but also by the established psychology researchers. On average, the first-year students endorsed 3.51 statements, the masters students 3.24, and the researchers 3.45. The survey also included a question about experience with statistics. Participants who reported more experience of statistics endorsed just as many of the false statements.

The correct interpretation of Bumbledorf’s statement is that in 100 samples obtained using his methods, you would expect the true value of the mean to lie in the 95 per cent confidence interval for 95 of those samples. Note that each sample would have its own interval (i.e its own upper and lower limit), a fact which is inconsistent with the three test statements shown above.

Hoekstra and his colleagues said their results showed “dramatic and similar levels of misinterpretation among both researchers and students.” They added: “One could question whether our findings indicate a serious problem in scientific practice, rather than merely an academic issue. We argue that they do indicate a serious problem, one closely related to what some authors refer to as a crisis in the social and behavioural sciences.”


Hoekstra, R., Morey, R., Rouder, J., & Wagenmakers, E. (2014). Robust misinterpretation of confidence intervals Psychonomic Bulletin & Review, 21 (5), 1157-1164 DOI: 10.3758/s13423-013-0572-3

further reading
Made it! An uncanny number of psychology findings manage to scrape into statistical significance

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s