One of the greatest temptations for psychologists is to report “marginally significant” research results. When statistical tests spit out values that are tantalisingly close to reaching significance, many just can’t help themselves.
Now a study in Psychological Science has shown just how widespread this practice is. Anton Olsson-Collentine and colleagues from Tilburg University analysed three decades of psychology papers and found that a whopping 40 per cent of p-values between 0.05 and 0.1 – i.e. those not significant according to conventional thresholds – were described by experimenters as “marginally significant”.
As the list of failed replications continues to build, psychology’s reproducibility crisis is becoming harder to ignore. Now, in a new paper that seems likely to ruffle a few feathers, researchers suggest that even many apparent successful replications in neuroimaging research could be standing on shaky ground.As the paper’s title bluntly puts it, the way imaging results are currently analysed “allows presenting anything as a replicated finding.”
The provocative argument is put forward by YongWook Hong from Sungkyunkwan University in South Korea and colleagues, in a preprint posted recently to bioRxiv. The fundamental problem, say the researchers, is that scientists conducting neuroimaging research tend to make and test hypotheses with reference to large brain structures. Yet neuroimaging techniques, particularly functional magnetic resonance imaging (fMRI), gather data at a much more fine-grained resolution.
This means that strikingly different patterns of brain activity could produce what appears to be the same result. For example, one lab might find that a face recognition task activates the amygdala (a structure found on each side of the brain that’s involved in emotional processing). Later, another lab apparently replicates this finding, showing activation in the same structure during the same task. But the amygdala contains hundreds of individual “voxels”, the three-dimensional pixels that form the basic unit of fMRI data. So the second lab could have found activity in a completely different part of the amygdala, yet it would appear that they had replicated the original result.
Stereotype threat is a very evocative, disturbing idea: Imagine if simply being reminded that you are a member of a disadvantaged group, and that stereotypes hold that members of your group are bad at certain tasks, led to a self-fulfilling prophecy in which you performed worse on such tasks than you would otherwise.
That’s been the claim of stereotype threat researchers since the concept was first introduced in the mid-1990s, and it’s spread far and wide. But as seems to be the case with so many strong psychological claims of late, in recent years the picture has gotten a bit murkier. “A recent review suggested that stereotype threat has a robust but small-to-medium sized effect on performance,” wrote Alex Fradera here at the BPS Research Digest in 2017, “but a meta-analysis suggests that publication bias may be a problem in this literature, inflating the apparent size of the effect.” Adding to the confusion are some results which seem to run exactly opposite to what the theory would suspect, like the one Fradera was reporting on: In that study, female chess players were found to have performed better, not worse, against male opponents, which isn’t what the theory would have predicted.
Now, another study is poised to complicate things yet further. In a paper to be published in the European Journal of Social Psychology, and available as a preprint, a team led by Charlotte Pennington of UWE Bristol recruited female participants to test two mechanisms (reduced effort and working memory disruption) that have been offered to explain the supposed adverse performance effects of gender-related stereotype threat. They also compared different ways of inducing stereotype threat. Interesting questions, you might think, but in all cases the researchers came up empty.
For a long time, some psychologists have understood that their field has an issue with WEIRDness. That is, psychology experiments disproportionately involve participants who are Western, Educated, and hail from Industrialised, Rich Democracies, which means many findings may not generalise to other populations, such as, say, rural Samoan villagers.
In a new paper in PNAS, a team of researchers led by Mostafa Salari Rad decided to zoom in on a leading psychology journal to better understand the field’s WEIRD problem, evaluate whether things are improving, and come up with some possible changes in practice that could help spur things along.
It has been a long and bumpy road for the implicit association test (IAT), the reaction-time-based psychological instrument whose co-creators, Mahzarin Banaji and Anthony Greenwald — among others in their orbit — claimed measures test-takers’ levels of unconscious social biases and their propensity to act in a biased and discriminatory manner, be that via racism, sexism, ageism, or some other category, depending on the context. The test’s advocates claimed this was a revelatory development, not least because the IAT supposedly measures aspects of an individual’s bias even beyond what that individual was consciously aware of themselves.
As I explained in a lengthy feature published on New York Magazine’s website last year, many doubts have emerged about these claims, ranging from the question of what the IAT is really measuring (as in, can a reaction-time difference measured in milliseconds really be considered, on its face, evidence of real-world-relevant bias?) to the algorithms used to generate scores to, perhaps most importantly (given that the IAT has become a mainstay of a wide variety of diversity training and educational programmes), whether the test really does predict real-world behaviour.
On that last key point, there is surprising agreement. In 2015 Greenwald, Banaji, and their coauthor Brian Nosek stated that the psychometric issues associated with various IATs “render them problematic to use to classify persons as likely to engage in discrimination”. Indeed, these days IAT evangelist and critic alike mostly agree that the test is too noisy to usefully and accurately gauge people’s likelihood of engaging in discrimination — a finding supported by a series of meta-analyses showing unimpressive correlations between IAT scores and behavioral outcomes (mostly in labs). Race IAT scores appear to account for only about 1 per cent of the variance in measured behavioural outcomes, reports an important meta-analysis available in preprint, co-authored by Nosek. (That meta-analysis also looked at IAT-based interventions, finding that while implicit bias as measured by the IAT “is malleable… changing implicit bias does not necessarily lead to changes in explicit bias or behavior.”)
So where does this leave the IAT? In a new paper in Current Directions in Psychological Sciencecalled “The IAT Is Dead, Long Live The Iat: Context-Sensitive Measures of Implicit Attitudes Are Indispensable to Social and Political Psychology”, John Jost, a social psychologist at New York University and a leading IAT researcher, seeks to draw a clear line between the “dead” diagnostic-version of the IAT, and what he sees as the test’s real-world version – a sensitive, context-specific measure that shouldn’t be used for diagnostic purposes, but which has potential in various research and educational contexts.
Does this represent a constructive manifesto for the future of this controversial psychological tool? Unfortunately, I don’t think it does – rather, it contains many confusions, false claims, and strawman arguments (as well as a misrepresentation of my own work). Perhaps most frustrating, Jost joins a lengthening line of IAT researchers who, when faced with the fact that the IAT appears to have been overhyped for a long time by its creators, most enthusiastic proponents, and by journalists, responds with an endless variety of counterclaims that don’t quite address the core issue itself, or which pretend those initial claims were never made in the first place.
Replicating a study isn’t easy. Just knowing how the original was conducted isn’t enough. Just having access to a sample of experimental participants isn’t enough. As psychological researchers have known for a long time, all sorts of subtle cues can affect how individuals respond in experimental settings. A failure to replicate, then, doesn’t always mean that the effect being studied isn’t there – it can simply mean the new study was conducted a bit differently.
Many Labs 2, a project of the Center for Open Science at the University of Virginia, embarked on one of the most ambitious replication efforts in psychology yet – and did so in a way designed to address these sorts of critiques, which have in some cases hampered past efforts. The resultant paper, a preprint of which can be viewed here, is lead-authored by Richard A. Klein of the Université Grenoble Alpes. Klein and his very, very large team – it takes almost four pages of the preprint just to list all the contributors – “conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer-reviewed in advance to examine variation in effect magnitudes across sample and setting.”
Psychology as a scientific field enjoys a tremendous level of popularity throughout society, a fascination that could even be described as religious. This is likely the reason why it is one of the most popular undergraduate majors in European and American universities. At the same time, it is not uncommon to encounter the firm opinion that psychology in no way qualifies for consideration as a science. Such extremely critical opinions about psychology are often borrowed from authorities – after all, it was none other than the renowned physicist and Nobel laureate Richard Feynman who, in a famous interview in 1974, compared the social sciences and psychology in particular to a cargo cult. Scepticism toward psychological science can also arise following encounters with the commonplace simplifications and myths spread by pop-psychology, or as a product of a failure to understand what science is and how it solves its dilemmas.
According to William O’Donohue and Brendan Willis of the University of Nevada, these issues are further compounded by undergraduate psychology textbooks. Writing recently in Archives of Scientific Psychology, they argue that “[a] lack of clarity and accuracy in [psych textbooks] in describing what science is and psychology’s relationship to science are at the heart of these issues.” The authors based their conclusions on a review of 30 US and UK undergraduate psychology textbooks, most updated in the last few years (general texts and others covering abnormal, social and cognitive psych), in which they looked for 18 key contemporary issues in philosophy of science.
Part of the strength of the widely endorsed Big Five model of personality is its efficient explanatory power – in the traits of Extraversion, Neuroticism, Openness, Agreeableness and Conscientiousness, it removes the redundancy of more fine-grained approaches and manages to capture the most meaningful variance in our habits of thought and behaviour.
So what to make then of the popular proposal that what marks out high achievers from the rest is that they rank highly on another trait labelled as “Grit”?
Is the recognition of Grit, and the development of a scale to measure it, a breakthrough in our understanding of the psychology of success? Or is it a reinvention of the wheel, a redundant addition to the taxonomy of personality psychology?
In 2016, the US-based authors of a meta-analysis on the topic concluded “that Grit as currently measured is simply a repackaging of Conscientiousness”. Now a different research team, based in Germany and Switzerland, has taken a more intricate look at the links between Grit and Conscientiousness, this time including a focus on their respective facets (or sub-traits). Writing in the European Journal of Personality, Fabian Schmidt and his colleagues conclude that “Grit represents yet another contribution to the common problem of redundant labelling of constructs in personality psychology.”
Amid all the talk of a “replication crisis” in psychology, here’s a rare good news story – a new project has found that a sub-field of the discipline, known as “experimental philosophy” or X-phi, is producing results that are impressively robust.
The current crisis in psychology was largely precipitated by a mass replication attempt published by the Open Science Collaboration (OSC) project in 2015. Of 100 previously published significant findings, only 39 per cent replicated unambiguously, rising to 47 per cent on more relaxed criteria.
Part of my role at the Digest involves sifting through journals looking for research worth covering, and I’ve sensed that modern social psychology generates plenty of studies based on questionnaire data, but far fewer that investigate the kind of tangible behavioural outcomes illuminated by the field’s classics, from Asch’s conformity experiments to Milgram’s research on obedience to authority. A new paper in Social Psychological Bulletin examines this apparent change systematically. Based on his findings, Dariusz Doliński at the SWPS University of Social Sciences and Humanities in Poland asks the bleak question: is psychology still a science of behaviour?