Often when we discuss the replication crisis in psychology, the main focus is on what it means for the research community — how do research practices need to change, for instance, or which sub-disciplines are most affected? These are all important questions, of course. But there’s another that perhaps receives less attention: what do the general public think about the field of psychology when they hear that supposedly key findings are not reproducible?
As most observers of psychological science recognise, the field is in the midst of a replication crisis. Multiple high-profile efforts to replicate past findings have turned up some dismal results — in the 2015 Open Science Collaboration published in Science, for example, just 36% of the evaluated studies showed statistically significant effects the second time around. The results of Many Labs 2, published last year, weren’t quite as bad, but still pretty dismal: just 50% of studies replicated during that effort.
Some of these failed replications don’t come across as all that surprising, at least in retrospect, given the audacity of original claims. For example, a study published in Science in 2012 claimed that subjects who looked at an image of The Thinker had, on average, a 20-point lower belief in God on a 100-point scale than those who looked at a supposedly less analytical statue of a discus thrower, leading to the study’s headline finding that “Analytic Thinking Promotes Religious Disbelief.” It’s an astonishing and unlikely result given how tenaciously most people cling to (non)belief — it defies common sense to think simply looking at a statue could have such an effect. “In hindsight, our study was outright silly,” the lead author admitted to Vox after the study failed to replicate. Plenty of other psychological studies have made similarly bold claims.
In light of this, an interesting, obvious question is how much stock we should put into this sort of intuition: does it actually tell us something useful when a given psychological result seems unlikely on an intuitive level? After all, science is replete with real discoveries that seemed ridiculous at first glance.
In a world made for right-handed people, life can sometimes be frustrating if you are among the 10% or so who are “adextral” — that is, left-handed or ambidextrous. Now a new grievance can be added to the list. Brain imaging researchers are systematically excluding adextrals from participating in their studies, according to an analysis of recent research papers published in top neuroimaging journals. Yet there’s no good reason to exclude this population, say the authors — and in fact, the practice could be detrimental to research.
As father to an 18-month-old toddler, I would love to know exactly what my son is thinking. Along with many parents, one of the ways I try to find out is to ask him questions of the variety “Do you want X or Y?” But does his answer to this type of question actually reveal his preference or is it a more a reflection of a quirky cognitive bias that is more powerful in children than adults?
Two competing effects influence how adults respond to binary choices. The first is called “The Primary Effect” which describes the way that the first option we hear tends to stick in our minds. For example, one study found that adults are more likely to choose “heads” when asked if a coin toss is going to be “heads or tails”.The second effect, which sometimes contrasts with the Primary Effect, is called “The Recency Effect”. This captures the way that the last thing we hear or experience can also have more weight in our memory (this is why popstars end their concerts on their best songs, so that everyone leaves thinking the whole gig was great). In adults, neither the Primary Effect or the Recency Effect is always the more pronounced, with evidence suggesting that one’s personality type, familiarity of the information, and how controversial the topic is, all play a mediating role.
Keen to test if the same is true in young children, or if one of the effects is more dominant, a team led by Emily Sumner conducted two experiments and published their findings in a recent paper in PLOS One fantastically titled “Cake or Broccoli: Recency Biases Children’s Verbal Responses”.
You’ve been transported deep beneath the earth into a labyrinth of tunnels. You have a sword and a communications device, and your objective is to return to the surface. A figure appears in the dark ahead of you. Do you: (a) Use your communication device to say hello; (b) Formulate a contingency plan for escape and then approach the figure; or (c) Pause a moment to try to read its body language before stepping forward to approach the figure? [to interpret your preference, see end of post]
Personality traits are traits are traditionally assessed by asking people to rate how much various descriptive statements match their own personality, like “I enjoy talking to strangers”. This cheap and easy approach has enjoyed great success – people’s scores on such tests tend to be impressively consistent over time, and they predict important outcomes from health to career success. However, the questionnaires are far from perfect. Research volunteers might not properly engage out of boredom, for instance. Job candidates might deliberately fake their scores to give a favourable impression.
An exciting possibility for overcoming these issues, according to a new paper in Personality and Individual Differences is to use a “gamification” approach – present people with behavioural options in engaging game-like scenarios and deduce their personality traits from their choices.
In case you hadn’t noticed, there is an ongoing debate about the existence of differences between women’s and men’s brains, and the extent to which these might be linked to biological or to cultural factors. In this debate, a real game-changer of a study would involve the identification of clear-cut sex differences in foetal brains: that is, in brains that have not yet been exposed to all the different expectations and experiences that the world might offer. A recent open-access study published in Developmental Cognitive Neuroscience by Muriah Wheelock at the University of Washington and her colleagues, including senior researcher Moriah Thomason at New York University School of Medicine, claims to have done just that, hailed by the researchers themselves as “confirmation that sexual dimorphism in functional brain systems emerges during human gestation” and in various ways by the popular press as, for example, The Times of London’s headline: “Proof at last: women and men are born to be different”.
Does this study live up to the claims made by its authors and, more excitedly, those passing the message on? I think not.
There’s no simple explanation for why psychology has been hit so hard by the replication crisis – it’s the result of a complicated mix of professional incentives, questionable research practices, and other factors, including the sheer popularity of the sorts of sexy, counterintuitive findings that make for great TED Talk fodder.
But that might not be the entire story. Some have also posited a more sociological explanation: political bias. After all, psychology is overwhelmingly liberal. Estimates vary and depend on the methodology used to generate them, but among professional psychologists the ratio of liberals to conservatives is something like 14:1. A new PsyArXiv preprint first-authored by Diego Reinero at New York University – and involving an “adversarial collaboration” in which “ two sets of authors were simultaneously testing the same question with different theoretical commitments” – has looked for evidence to support this explanation, and found that while liberal bias per se is not associated with research replicability, highly politically biased findings of either slant (liberal or conservative) are less robust.
The age of social media has opened up exciting opportunities for researchers to investigate people’s emotional states on a massive scale. For example, one study found that tweets contain more positive emotional words in the morning, which was interpreted as showing that most people are in a better mood at that time of day.
The premise of this line of research is that our word choices reflect our psychological states – that if someone uses more positive or negative emotional words, this is a good indication that they are actually experiencing those emotions. But now a new study has thrown a spanner in the works, finding that – for spoken language at least – this assumption might not hold up. In their preprint posted recently on PsyArxiv, Jessie Sun and colleagues found that emotion-related words do not in fact provide a good indication of a person’s mood, although there may be other sets of words that do.
One of the greatest temptations for psychologists is to report “marginally significant” research results. When statistical tests spit out values that are tantalisingly close to reaching significance, many just can’t help themselves.
Now a study in Psychological Science has shown just how widespread this practice is. Anton Olsson-Collentine and colleagues from Tilburg University analysed three decades of psychology papers and found that a whopping 40 per cent of p-values between 0.05 and 0.1 – i.e. those not significant according to conventional thresholds – were described by experimenters as “marginally significant”.
As the list of failed replications continues to build, psychology’s reproducibility crisis is becoming harder to ignore. Now, in a new paper that seems likely to ruffle a few feathers, researchers suggest that even many apparent successful replications in neuroimaging research could be standing on shaky ground.As the paper’s title bluntly puts it, the way imaging results are currently analysed “allows presenting anything as a replicated finding.”
The provocative argument is put forward by YongWook Hong from Sungkyunkwan University in South Korea and colleagues, in a preprint posted recently to bioRxiv. The fundamental problem, say the researchers, is that scientists conducting neuroimaging research tend to make and test hypotheses with reference to large brain structures. Yet neuroimaging techniques, particularly functional magnetic resonance imaging (fMRI), gather data at a much more fine-grained resolution.
This means that strikingly different patterns of brain activity could produce what appears to be the same result. For example, one lab might find that a face recognition task activates the amygdala (a structure found on each side of the brain that’s involved in emotional processing). Later, another lab apparently replicates this finding, showing activation in the same structure during the same task. But the amygdala contains hundreds of individual “voxels”, the three-dimensional pixels that form the basic unit of fMRI data. So the second lab could have found activity in a completely different part of the amygdala, yet it would appear that they had replicated the original result.