Often when we discuss the replication crisis in psychology, the main focus is on what it means for the research community — how do research practices need to change, for instance, or which sub-disciplines are most affected? These are all important questions, of course. But there’s another that perhaps receives less attention: what do the general public think about the field of psychology when they hear that supposedly key findings are not reproducible?
As most observers of psychological science recognise, the field is in the midst of a replication crisis. Multiple high-profile efforts to replicate past findings have turned up some dismal results — in the 2015 Open Science Collaboration published in Science, for example, just 36% of the evaluated studies showed statistically significant effects the second time around. The results of Many Labs 2, published last year, weren’t quite as bad, but still pretty dismal: just 50% of studies replicated during that effort.
Some of these failed replications don’t come across as all that surprising, at least in retrospect, given the audacity of original claims. For example, a study published in Science in 2012 claimed that subjects who looked at an image of The Thinker had, on average, a 20-point lower belief in God on a 100-point scale than those who looked at a supposedly less analytical statue of a discus thrower, leading to the study’s headline finding that “Analytic Thinking Promotes Religious Disbelief.” It’s an astonishing and unlikely result given how tenaciously most people cling to (non)belief — it defies common sense to think simply looking at a statue could have such an effect. “In hindsight, our study was outright silly,” the lead author admitted to Vox after the study failed to replicate. Plenty of other psychological studies have made similarly bold claims.
In light of this, an interesting, obvious question is how much stock we should put into this sort of intuition: does it actually tell us something useful when a given psychological result seems unlikely on an intuitive level? After all, science is replete with real discoveries that seemed ridiculous at first glance.
To win a medal of any kind at the Olympic Games takes years of training, hard work and sacrifice. Standing on an Olympic podium is widely regarded as the pinnacle of an athlete’s career. Nonetheless, only one athlete can win gold, leaving the two runner-up medallists to ponder what might have been. Intriguingly a seminal study from the 1992 Olympic Games suggested that this counterfactual thinking was especially painful for silver medallists, who appeared visibly less happy than bronze medallists. The researchers speculated that this may have been because of the different counterfactual thinking they engaged in, with bronze medallists being happy that they didn’t come fourth while silver medallists felt sad that they didn’t win gold.
However, subsequent research based on the 2000 Olympic Games did not replicate this finding: this time silver medallists were found to be happier than bronze medallists. To further muddy the waters, a study from the 2004 Games was consistent with the seminal research, finding that straight after competition, gold and bronze medallists were more likely to smile than silver medallists, with these smiles being larger and more intense.
Now further insight into the psychology of coming second or third comes via Mark Allen, Sarah Knipler and Amy Chan of the University of Wollongong, who have released their findings based on the 2016 Olympic Games. These latest results, published in Journal of Sports Sciences, again challenge that initial eye-grabbing result that suggested bronze medallists are happier than silver medallists, but they support the idea that the nature of counterfactual thinking differs depending on whether athletes come second or third.
There’s no simple explanation for why psychology has been hit so hard by the replication crisis – it’s the result of a complicated mix of professional incentives, questionable research practices, and other factors, including the sheer popularity of the sorts of sexy, counterintuitive findings that make for great TED Talk fodder.
But that might not be the entire story. Some have also posited a more sociological explanation: political bias. After all, psychology is overwhelmingly liberal. Estimates vary and depend on the methodology used to generate them, but among professional psychologists the ratio of liberals to conservatives is something like 14:1. A new PsyArXiv preprint first-authored by Diego Reinero at New York University – and involving an “adversarial collaboration” in which “ two sets of authors were simultaneously testing the same question with different theoretical commitments” – has looked for evidence to support this explanation, and found that while liberal bias per se is not associated with research replicability, highly politically biased findings of either slant (liberal or conservative) are less robust.
Now researchers have reproduced the results of another highly-cited study. Back in 2002, Emily Pronin and colleagues first described the “bias blind spot”, the finding that people believe they are less biased in their judgments and behaviour than the general population – that is, they are “blind” to their own cognitive biases. And while that study kick-started a whole line of related research, no one had attempted to directly replicate the original experiments.
But in a preregistered preprint published recently to ResearchGate, Prasad Chandrashekar, Siu Kit Yeung and colleagues report reproducing the original study, first in a small group of Hong Kong undergraduates, and then in two larger samples of 303 and 621 Americans who completed online surveys.
As the list of failed replications continues to build, psychology’s reproducibility crisis is becoming harder to ignore. Now, in a new paper that seems likely to ruffle a few feathers, researchers suggest that even many apparent successful replications in neuroimaging research could be standing on shaky ground.As the paper’s title bluntly puts it, the way imaging results are currently analysed “allows presenting anything as a replicated finding.”
The provocative argument is put forward by YongWook Hong from Sungkyunkwan University in South Korea and colleagues, in a preprint posted recently to bioRxiv. The fundamental problem, say the researchers, is that scientists conducting neuroimaging research tend to make and test hypotheses with reference to large brain structures. Yet neuroimaging techniques, particularly functional magnetic resonance imaging (fMRI), gather data at a much more fine-grained resolution.
This means that strikingly different patterns of brain activity could produce what appears to be the same result. For example, one lab might find that a face recognition task activates the amygdala (a structure found on each side of the brain that’s involved in emotional processing). Later, another lab apparently replicates this finding, showing activation in the same structure during the same task. But the amygdala contains hundreds of individual “voxels”, the three-dimensional pixels that form the basic unit of fMRI data. So the second lab could have found activity in a completely different part of the amygdala, yet it would appear that they had replicated the original result.
While psychology has been mired in a “replication crisis” recently – based on the failure of contemporary researchers to recreate some of its most cherished findings – there have been pockets of good news for certain sub-disciplines in the field. For instance, some replication efforts in cognitive psychology and experimental philosophy or X-phi have been more successful, suggesting that results in these areas are more robust.
To this more optimistic list we may now add personality psychology, or at least the specific area of research linking the Big Five personality trait scores with various personal and life outcomes, such as higher Neuroticism being associated with poorer mental health and reduced relationship satisfaction; higher trait Conscientiousness being associated with less risk of substance abuse; and stronger Extraversion correlating with leadership roles.
In his new paper that is in press at Psychological Science (and available as a preprint at the Open Science Framework), Christopher Soto at Colby College speculates that perhaps it is the tendency for researchers in personality to use large samples of participants, numbering in the hundreds or thousands, and to use reliable, standardised tests, that is to some extent responsible for the relatively robust results in this area. The new findings “leave us cautiously optimistic about the current state and future prospects of the personality-outcome literature,” Soto writes.
Stereotype threat is a very evocative, disturbing idea: Imagine if simply being reminded that you are a member of a disadvantaged group, and that stereotypes hold that members of your group are bad at certain tasks, led to a self-fulfilling prophecy in which you performed worse on such tasks than you would otherwise.
That’s been the claim of stereotype threat researchers since the concept was first introduced in the mid-1990s, and it’s spread far and wide. But as seems to be the case with so many strong psychological claims of late, in recent years the picture has gotten a bit murkier. “A recent review suggested that stereotype threat has a robust but small-to-medium sized effect on performance,” wrote Alex Fradera here at the BPS Research Digest in 2017, “but a meta-analysis suggests that publication bias may be a problem in this literature, inflating the apparent size of the effect.” Adding to the confusion are some results which seem to run exactly opposite to what the theory would suspect, like the one Fradera was reporting on: In that study, female chess players were found to have performed better, not worse, against male opponents, which isn’t what the theory would have predicted.
Now, another study is poised to complicate things yet further. In a paper to be published in the European Journal of Social Psychology, and available as a preprint, a team led by Charlotte Pennington of UWE Bristol recruited female participants to test two mechanisms (reduced effort and working memory disruption) that have been offered to explain the supposed adverse performance effects of gender-related stereotype threat. They also compared different ways of inducing stereotype threat. Interesting questions, you might think, but in all cases the researchers came up empty.
If you Google “holding a warm cup of coffee can” you’ll get a handful of results all telling the same story based on social priming research (essentially the study of how subtle cues affect human thoughts and behavior). “Whether a person is holding a warm cup of coffee can influence his or her views of other people, and a person who has experienced rejection may begin to feel cold,” notes a New York Times blog post, while a Psychology Today article explains that research shows that “holding a warm cup of coffee can make you feel socially closer to those around you.”
These kind of findings are most often associated with John Bargh, a Yale University professor and one of the godfathers of social priming. In his 2017 book Before You Know It: The Unconscious Reasons We Do What We Do, Bargh goes further, even suggesting – based on social priming studies and a small study that found two hours of “hyperthermia” treatment with an infra lamp helped depressed in-patients – that soup might be able to treat depression. “After all,” he writes, “it turns out that a warm bowl of chicken soup really is good for the soul, as the warmth of the soup helps replace the social warmth that may be missing from the person’s life, as when we are lonely or homesick.” He continues, “These simple home remedies are unlikely to make big profits for the pharmaceutical and psychiatric industries, but if the goal is a broader and more general increase in public mental health, some research into their possible helpfulness could pay big dividends for individuals currently in distress, and for society as a whole.”
Replicating a study isn’t easy. Just knowing how the original was conducted isn’t enough. Just having access to a sample of experimental participants isn’t enough. As psychological researchers have known for a long time, all sorts of subtle cues can affect how individuals respond in experimental settings. A failure to replicate, then, doesn’t always mean that the effect being studied isn’t there – it can simply mean the new study was conducted a bit differently.
Many Labs 2, a project of the Center for Open Science at the University of Virginia, embarked on one of the most ambitious replication efforts in psychology yet – and did so in a way designed to address these sorts of critiques, which have in some cases hampered past efforts. The resultant paper, a preprint of which can be viewed here, is lead-authored by Richard A. Klein of the Université Grenoble Alpes. Klein and his very, very large team – it takes almost four pages of the preprint just to list all the contributors – “conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer-reviewed in advance to examine variation in effect magnitudes across sample and setting.”