You’ve been transported deep beneath the earth into a labyrinth of tunnels. You have a sword and a communications device, and your objective is to return to the surface. A figure appears in the dark ahead of you. Do you: (a) Use your communication device to say hello; (b) Formulate a contingency plan for escape and then approach the figure; or (c) Pause a moment to try to read its body language before stepping forward to approach the figure? [to interpret your preference, see end of post]
Personality traits are traits are traditionally assessed by asking people to rate how much various descriptive statements match their own personality, like “I enjoy talking to strangers”. This cheap and easy approach has enjoyed great success – people’s scores on such tests tend to be impressively consistent over time, and they predict important outcomes from health to career success. However, the questionnaires are far from perfect. Research volunteers might not properly engage out of boredom, for instance. Job candidates might deliberately fake their scores to give a favourable impression.
An exciting possibility for overcoming these issues, according to a new paper in Personality and Individual Differences is to use a “gamification” approach – present people with behavioural options in engaging game-like scenarios and deduce their personality traits from their choices.
In case you hadn’t noticed, there is an ongoing debate about the existence of differences between women’s and men’s brains, and the extent to which these might be linked to biological or to cultural factors. In this debate, a real game-changer of a study would involve the identification of clear-cut sex differences in foetal brains: that is, in brains that have not yet been exposed to all the different expectations and experiences that the world might offer. A recent open-access study published in Developmental Cognitive Neuroscience by Muriah Wheelock at the University of Washington and her colleagues, including senior researcher Moriah Thomason at New York University School of Medicine, claims to have done just that, hailed by the researchers themselves as “confirmation that sexual dimorphism in functional brain systems emerges during human gestation” and in various ways by the popular press as, for example, The Times of London’s headline: “Proof at last: women and men are born to be different”.
Does this study live up to the claims made by its authors and, more excitedly, those passing the message on? I think not.
There’s no simple explanation for why psychology has been hit so hard by the replication crisis – it’s the result of a complicated mix of professional incentives, questionable research practices, and other factors, including the sheer popularity of the sorts of sexy, counterintuitive findings that make for great TED Talk fodder.
But that might not be the entire story. Some have also posited a more sociological explanation: political bias. After all, psychology is overwhelmingly liberal. Estimates vary and depend on the methodology used to generate them, but among professional psychologists the ratio of liberals to conservatives is something like 14:1. A new PsyArXiv preprint first-authored by Diego Reinero at New York University – and involving an “adversarial collaboration” in which “ two sets of authors were simultaneously testing the same question with different theoretical commitments” – has looked for evidence to support this explanation, and found that while liberal bias per se is not associated with research replicability, highly politically biased findings of either slant (liberal or conservative) are less robust.
The age of social media has opened up exciting opportunities for researchers to investigate people’s emotional states on a massive scale. For example, one study found that tweets contain more positive emotional words in the morning, which was interpreted as showing that most people are in a better mood at that time of day.
The premise of this line of research is that our word choices reflect our psychological states – that if someone uses more positive or negative emotional words, this is a good indication that they are actually experiencing those emotions. But now a new study has thrown a spanner in the works, finding that – for spoken language at least – this assumption might not hold up. In their preprint posted recently on PsyArxiv, Jessie Sun and colleagues found that emotion-related words do not in fact provide a good indication of a person’s mood, although there may be other sets of words that do.
One of the greatest temptations for psychologists is to report “marginally significant” research results. When statistical tests spit out values that are tantalisingly close to reaching significance, many just can’t help themselves.
Now a study in Psychological Science has shown just how widespread this practice is. Anton Olsson-Collentine and colleagues from Tilburg University analysed three decades of psychology papers and found that a whopping 40 per cent of p-values between 0.05 and 0.1 – i.e. those not significant according to conventional thresholds – were described by experimenters as “marginally significant”.
As the list of failed replications continues to build, psychology’s reproducibility crisis is becoming harder to ignore. Now, in a new paper that seems likely to ruffle a few feathers, researchers suggest that even many apparent successful replications in neuroimaging research could be standing on shaky ground.As the paper’s title bluntly puts it, the way imaging results are currently analysed “allows presenting anything as a replicated finding.”
The provocative argument is put forward by YongWook Hong from Sungkyunkwan University in South Korea and colleagues, in a preprint posted recently to bioRxiv. The fundamental problem, say the researchers, is that scientists conducting neuroimaging research tend to make and test hypotheses with reference to large brain structures. Yet neuroimaging techniques, particularly functional magnetic resonance imaging (fMRI), gather data at a much more fine-grained resolution.
This means that strikingly different patterns of brain activity could produce what appears to be the same result. For example, one lab might find that a face recognition task activates the amygdala (a structure found on each side of the brain that’s involved in emotional processing). Later, another lab apparently replicates this finding, showing activation in the same structure during the same task. But the amygdala contains hundreds of individual “voxels”, the three-dimensional pixels that form the basic unit of fMRI data. So the second lab could have found activity in a completely different part of the amygdala, yet it would appear that they had replicated the original result.
Stereotype threat is a very evocative, disturbing idea: Imagine if simply being reminded that you are a member of a disadvantaged group, and that stereotypes hold that members of your group are bad at certain tasks, led to a self-fulfilling prophecy in which you performed worse on such tasks than you would otherwise.
That’s been the claim of stereotype threat researchers since the concept was first introduced in the mid-1990s, and it’s spread far and wide. But as seems to be the case with so many strong psychological claims of late, in recent years the picture has gotten a bit murkier. “A recent review suggested that stereotype threat has a robust but small-to-medium sized effect on performance,” wrote Alex Fradera here at the BPS Research Digest in 2017, “but a meta-analysis suggests that publication bias may be a problem in this literature, inflating the apparent size of the effect.” Adding to the confusion are some results which seem to run exactly opposite to what the theory would suspect, like the one Fradera was reporting on: In that study, female chess players were found to have performed better, not worse, against male opponents, which isn’t what the theory would have predicted.
Now, another study is poised to complicate things yet further. In a paper to be published in the European Journal of Social Psychology, and available as a preprint, a team led by Charlotte Pennington of UWE Bristol recruited female participants to test two mechanisms (reduced effort and working memory disruption) that have been offered to explain the supposed adverse performance effects of gender-related stereotype threat. They also compared different ways of inducing stereotype threat. Interesting questions, you might think, but in all cases the researchers came up empty.
For a long time, some psychologists have understood that their field has an issue with WEIRDness. That is, psychology experiments disproportionately involve participants who are Western, Educated, and hail from Industrialised, Rich Democracies, which means many findings may not generalise to other populations, such as, say, rural Samoan villagers.
In a new paper in PNAS, a team of researchers led by Mostafa Salari Rad decided to zoom in on a leading psychology journal to better understand the field’s WEIRD problem, evaluate whether things are improving, and come up with some possible changes in practice that could help spur things along.
It has been a long and bumpy road for the implicit association test (IAT), the reaction-time-based psychological instrument whose co-creators, Mahzarin Banaji and Anthony Greenwald — among others in their orbit — claimed measures test-takers’ levels of unconscious social biases and their propensity to act in a biased and discriminatory manner, be that via racism, sexism, ageism, or some other category, depending on the context. The test’s advocates claimed this was a revelatory development, not least because the IAT supposedly measures aspects of an individual’s bias even beyond what that individual was consciously aware of themselves.
As I explained in a lengthy feature published on New York Magazine’s website last year, many doubts have emerged about these claims, ranging from the question of what the IAT is really measuring (as in, can a reaction-time difference measured in milliseconds really be considered, on its face, evidence of real-world-relevant bias?) to the algorithms used to generate scores to, perhaps most importantly (given that the IAT has become a mainstay of a wide variety of diversity training and educational programmes), whether the test really does predict real-world behaviour.
On that last key point, there is surprising agreement. In 2015 Greenwald, Banaji, and their coauthor Brian Nosek stated that the psychometric issues associated with various IATs “render them problematic to use to classify persons as likely to engage in discrimination”. Indeed, these days IAT evangelist and critic alike mostly agree that the test is too noisy to usefully and accurately gauge people’s likelihood of engaging in discrimination — a finding supported by a series of meta-analyses showing unimpressive correlations between IAT scores and behavioral outcomes (mostly in labs). Race IAT scores appear to account for only about 1 per cent of the variance in measured behavioural outcomes, reports an important meta-analysis available in preprint, co-authored by Nosek. (That meta-analysis also looked at IAT-based interventions, finding that while implicit bias as measured by the IAT “is malleable… changing implicit bias does not necessarily lead to changes in explicit bias or behavior.”)
So where does this leave the IAT? In a new paper in Current Directions in Psychological Sciencecalled “The IAT Is Dead, Long Live The Iat: Context-Sensitive Measures of Implicit Attitudes Are Indispensable to Social and Political Psychology”, John Jost, a social psychologist at New York University and a leading IAT researcher, seeks to draw a clear line between the “dead” diagnostic-version of the IAT, and what he sees as the test’s real-world version – a sensitive, context-specific measure that shouldn’t be used for diagnostic purposes, but which has potential in various research and educational contexts.
Does this represent a constructive manifesto for the future of this controversial psychological tool? Unfortunately, I don’t think it does – rather, it contains many confusions, false claims, and strawman arguments (as well as a misrepresentation of my own work). Perhaps most frustrating, Jost joins a lengthening line of IAT researchers who, when faced with the fact that the IAT appears to have been overhyped for a long time by its creators, most enthusiastic proponents, and by journalists, responds with an endless variety of counterclaims that don’t quite address the core issue itself, or which pretend those initial claims were never made in the first place.
Replicating a study isn’t easy. Just knowing how the original was conducted isn’t enough. Just having access to a sample of experimental participants isn’t enough. As psychological researchers have known for a long time, all sorts of subtle cues can affect how individuals respond in experimental settings. A failure to replicate, then, doesn’t always mean that the effect being studied isn’t there – it can simply mean the new study was conducted a bit differently.
Many Labs 2, a project of the Center for Open Science at the University of Virginia, embarked on one of the most ambitious replication efforts in psychology yet – and did so in a way designed to address these sorts of critiques, which have in some cases hampered past efforts. The resultant paper, a preprint of which can be viewed here, is lead-authored by Richard A. Klein of the Université Grenoble Alpes. Klein and his very, very large team – it takes almost four pages of the preprint just to list all the contributors – “conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer-reviewed in advance to examine variation in effect magnitudes across sample and setting.”