Like a tired boxer at the Olympic Games, the reputation of psychological science has just taken another punch to the gut. After a series of fraud scandals in social psychology and a US survey that revealed the widespread use of questionable research practices, a paper published this month finds that an unusually large number of psychology findings are reported as “just significant” in statistical terms.
The pattern of results could be indicative of dubious research practices, in which researchers nudge their results towards significance, for example by excluding troublesome outliers or adding new participants. Or it could reflect a selective publication bias in the discipline – an obsession with reporting results that have the magic stamp of statistical significance. Most likely it reflects a combination of both these influences. On a positive note, psychology, perhaps more than any other branch of science, is showing an admirable desire and ability to police itself and to raise its own standards.
E. J. Masicampo at Wake Forest University, USA, and David Lalande at Université du Québec à Chicoutimi, analysed 12 months of issues, July 2007 – August 2008, from three highly regarded psychology journals – the Journal of Experimental Psychology: General; Journal of Personality and Social Psychology; and Psychological Science.
In psychology, a common practice is to determine how probable (p) it is that the observed results in a study could have been obtained if the null hypothesis were true (the null hypothesis usually being that the treatment or intervention has no effect). The convention is to consider a probability of less than five per cent (p < .05) as an indication that the treatment or intervention really did have an influence; the null hypothesis can be rejected (this procedure is known as null hypothesis significance testing).
From the 36 journal issues Masicampo and Lalande identified 3,627 reported p values between .01 to .10 and their method was to see how evenly the p values were spread across that range (only studies that reported a precise figure were included). To avoid a bias in their approach, they counted the number of p values falling into “buckets” of different size, either .01, .005, .0025 or .00125 across the range.
The spread of p values between .01 and .10 followed an exponential curve – from .10 to .01 the number of p values increased gradually. But here’s the key finding – there was a glaring bump in the distribution between .045 and .050. The number of p values falling in this range was “much greater” than you’d expect based on the frequency of p values falling elsewhere in the distribution. In other words, an uncanny abundance of reported results just sneaked into the region of statistical significance.
“Biases linked to achieving statistical significance appear to have a measurable impact on the research publication process,” the researchers said.
The same general pattern was found regardless of whether Masicampo and Lalande analysed results from just one journal or all of them together, and mostly regardless of the size of the distribution buckets they looked at. Of course, there’s a chance the intent behind their investigations could have biased their analyses in some way. To check this, a research assistant completely blind to the study aims analysed p values from one of the journals – the same result was found.
Masicampo and Lalande said their findings pointed to the need to educate researchers about the proper interpretation of null hypothesis significance testing and the value of alternative approaches, such as reporting effect sizes and confidence intervals. ” … [T]he field may benefit from practices aimed at counteracting the single-minded drive toward achieving statistical significance,” they said.
Masicampo EJ, and Lalande DR (2012). A peculiar prevalence of p values just below .05. Quarterly journal of experimental psychology PMID: 22853650