Want To Know Whether A Psychology Study Will Replicate? Just Ask A Bunch Of People

The Thinker by Rodin
A version of The Thinker displayed in Buenos Aires

By guest blogger Jesse Singal

As most observers of psychological science recognise, the field is in the midst of a replication crisis. Multiple high-profile efforts to replicate past findings have turned up some dismal results — in the 2015 Open Science Collaboration published in Science, for example, just 36% of the evaluated studies showed statistically significant effects the second time around. The results of Many Labs 2, published last year, weren’t quite as bad, but still pretty dismal: just 50% of studies replicated during that effort.

Some of these failed replications don’t come across as all that surprising, at least in retrospect, given the audacity of original claims. For example, a study published in Science in 2012 claimed that subjects who looked at an image of The Thinker had, on average, a 20-point lower belief in God on a 100-point scale than those who looked at a supposedly less analytical statue of a discus thrower, leading to the study’s headline finding that “Analytic Thinking Promotes Religious Disbelief.” It’s an astonishing and unlikely result given how tenaciously most people cling to (non)belief — it defies common sense to think simply looking at a statue could have such an effect. “In hindsight, our study was outright silly,” the lead author admitted to Vox after the study failed to replicate. Plenty of other psychological studies have made similarly bold claims.

In light of this, an interesting, obvious question is how much stock we should put into this sort of intuition: does it actually tell us something useful when a given psychological result seems unlikely on an intuitive level? After all, science is replete with real discoveries that seemed ridiculous at first glance.

A new study, available as a preprint, from Suzanne Hoogeveen and colleagues at the University of Amsterdam set out to answer that question by asking laypeople to estimate whether “27 high-profile social science findings” would replicate. The team put 233 people, none of whom had a PhD in psychology or had heard of the Many Labs replication project, into one of two groups. One group was simply given a description of each finding; the other was given a description and an evaluation of the strength of the finding, based on the study’s so-called “Bayes factor.” (This was presented in a way that required no actual knowledge of Bayesian analysis. For example: “BF = 11.9. This qualifies as strong evidence.”) The participants then rated how confident they were that the finding would replicate on a scale from -100 (extremely confident that the study would not replicate) to 100 (extremely confident that it would replicate).

Overall, the group provided with just the description did an above-chance job predicting replicability — they did so accurately 58% of the time. Providing an analysis of the strength of evidence boosted that performance significantly, to 67%.

This chart runs down all the studies, and the distribution of replicability ratings the participants gave them. Light grey studies replicated, while darker ones didn’t:

Distribution of replicability ratings for each study, via Hoogeveen et al (2019)

It’s quite clear, visually, that while the participants were stymied by some of the studies in the middle, they did very well with regard to the studies they were most confident about, in either direction. The studies the respondents were most confident would replicate had indeed been successfully replicated, including the findings that “Body cues, not facial expressions, discriminate between intense positive and negative emotions and that people are less likely to choose to donate to one charity over another when told a significant chunk of their donation would go to administrative costs.

As for the studies almost everyone agreed wouldn’t replicate: one found that players given many chances to guess letters in a “Wheel of Fortune” game subsequently did better on an attention task than those given few chances, and another found that “Washing one’s hands after making a choice eliminates post-decisional dissonance effects, suggesting that hand-washing psychologically removes traces of the past, including concerns about past decisions.” And at the very, very bottom? The “silly” study about the atheism-inspiring statue.

All this suggests that these sorts of judgements can, in fact, provide useful information to the field of psychology as it starts the potentially mammoth undertaking of digging itself out from the replication crisis. And it’s worth noting that this isn’t the first hint that people are pretty good at this sort of prediction: there’s already some evidence that online bettors, at least, can “sniff out weak psychology studies,” as The Atlantic put it.

But there’s a dark side to this finding, too: “These results emphasize that the scientific culture of striving for newsworthy, extreme, and sexy findings is indeed problematic,” the authors note, “as counterintuitive findings are the least likely to replicate.”

This is a rather intense case of clashing incentives: as a research psychologist, you’re more likely to get media write-ups (or, if you’re really lucky, a TED Talk) if you come up with a counterintuitive finding. But those are exactly the findings which are least likely to replicate. Which can go a long way toward explaining the mess psychology is in.

Laypeople Can Predict Which Social Science Studies Replicate [this study is a preprint meaning that it has yet to be subjected to peer review and the final published version may differ from the version on which this report was based]

Post written by Jesse Singal (@JesseSingal) for the BPS Research Digest. Jesse is a contributing writer at New York Magazine, and he publishes his own newsletter featuring behavioral-science-talk. He is also working on a book about why shoddy behavioral-science claims sometimes go viral for Farrar, Straus and Giroux.

At Research Digest we’re proud to showcase the expertise and writing talent of our community. Click here for more about our guest posts.

16 thoughts on “Want To Know Whether A Psychology Study Will Replicate? Just Ask A Bunch Of People”

  1. Thank you for this insightful article! The findings suggest that asking laypeople to estimate the ‘replicability’ of published studies is a good indicator of whether the studies will actually replicate. But what if we focus on researchers and scientists instead of laypeople? And what if we used a more sophisticated method to aggregate forecasts (prediction markets) instead of surveys? Would we be able to identify more accurately which studies are more likely to replicate? These are the questions we investigate in our project, Replication Markets. To learn more, visit https://www.replicationmarkets.com. To register, see https://predict.replicationmarkets.com/main/#!/users/register?referral_id=bps

  2. By the way, nutrition studies is just as bad. Most of the official dietary guidelines were based on very low quality evidence. There is a major replication crisis going on in that field that is shaking up what has been assumed to be true for the past half century. Read Gary Taubes and Nina Teicholz.

    1. What is interesting is the difference shown here with psychology. This is not seen in nutritional studies. Why are so many non-experts able to determine whether a psychological study will replicate but can’t determine whether a nutritional study will replicate? Is this because we are trained from a young age to automatically give deference to those who we consider nutritional experts. This is interesting considering that doctors, for example, get almost no education on nutrition and so aren’t experts on it even as they are treated this way.

      In earlier centuries and continuing into the mid-20th century, it was common knowledge that carbohydrates, in particular grains, were fattening. This was common knowledge because it was at a time when much of the population still had some connection to or living memory of rural life and hence farming. It wasn’t that long ago when almost everyone knew that the way to fatten cattle was with carbs and it was also understood that the same thing fattened humans. Then there was a half century propaganda campaign to ‘teach’ Americans that what they previously thought was true was actually wrong.

      We lost the ability to think for ourselves and rely on common sense when it comes to diet. This is partly because almost all traditional knowledge of foodways was destroyed and lost. Check out these following words written in 1963, not that long ago: “Every woman knows that carbohydrates are fattening, this is a piece of common knowledge which few nutritionists would dispute” (R. Passmore & Y. E. Swindels, “Observation on the Respiratory Quotients and Weight Gain of Man After Eating Large Quantities of Carbohydrates”). Except for many decades, most nutritionists were educated to dispute this basic fact. And the general public took it as scientific truth, despite the evidence having been weak and much of the old research not being replicated since.

      Isn’t that amazing? Only in recent years have experts in nutritional studies come around to recognizing the replication crisis. It’s not only this field but more broadly medical studies and the life sciences in general (e.g., genetics). But the same crisis is seen elsewhere as well such as economics.

      What stands out is what is shown in the piece posted above. Even though psychology is in an equally bad replication crisis, the general public is able to see through it to what is likely good science and what is not. This is not true in some other fields of research. Why are people more deferential and less discerning in some areas than others? Common sense should be as applicable, maybe more applicable, to nutrition than to psychology. We all eat on a daily basis, but we don’t sit around thinking about psychology nearly as often.

      1. I think it might be because we have a model of how psychology works available to us. We can think how we would act if we were in that study. The prediction of whether the study might replicate might not be correct for a single person, but it should be correct for a larger set as long as people are able to accurately judge themselves.

        I believe people can judge themselves better than psychology research has suggested because usually, it seems to me that the control for estimating how well people judge themselves seems less reliable than the people judging themselves is in the first place. Perhaps self-judgment is not that reliable in the area of skills and knowledge, but i would say it is more so in predicting what one would do.

        With nutrition, you don’t really have a model in your head.

      2. I get the point you’re making. And it could explain some of it. But I’m not sure the real difference here is a model of how psychology works vs skills and knowledge about diet.

        Not many generations ago, most Americans would have had a fairly accurate model of how diet and nutrition works. This model would have been based on traditional knowledge about human health that was built up from millennia of human experience and in-built evolutionary instincts. Modern dietary ideology had to suppress and override our inherited common sense.

        Also, the models we have in psychology aren’t necessarily common sense. That has been shown with the WEIRD bias. It turns out the model we Westerners carry don’t always conform to the human nature of most other humans on the planet. Western culture, it turns out, is one of the least representative and most divergent of cultures. The model Westerners carry can’t be generalized to others.

        What makes us healthy, however, is basically the same for all humans everywhere. So, even Westerners are good about knowing which psychological studies in the West will replicate, they aren’t all that talented in figuring out which psychological studies elsewhere will replicate. Some of the social science studies done on hunter-gatherers, for example, defies Western models of psychology. It’s highly culture-dependent.

        There is obviously complex factors going on here. I’m not sure what we can conclude in comparing the respective replication crises in different fields of research, specifically psychology and nutrition in this case. But it does point toward something thought-provoking.

      3. I didn’t mean common sense or guessing correctly, I specifically mean guessing fairly correctly only about themselves. So it doesn’t matter if they are westerners or not. If you ask a person the survey questions or if you ask them what they think the survey result will be, these two results will probably reflect each other. So for a similar sample of people, you’d get correlation because they predict based on themselves and if you put everyone’s answers together, it makes a similar result.

        Sorry, I’m not good at guessing which part of my thread of thought is obvious and which part needs an explanation, so I failed to explain the big picture and focused on only the aspect of whether one person’s survey results would reflect their guess. When I said model, I didn’t mean that the model has to be good, just an idea of how their mind works, the capacity to predict their own actions and the tendency to wrongfully generalize that idea to everyone else. The wrongful generalization would help with the correlation in this case though.

      1. The problem is that, in a diverse number of fields of research, even the supposed scientific experts haven’t been able to consistently determine what is junk science and what is not. Or else that they theoretically could tell the difference but various biases got in the way, such as funding sources. Consider that highly profitable pharmaceutical and processed food companies fund not only research but also scientific conferences, scientific journals, development of college curricula, etc… not to mention pay for supplies, food, vacations, etc for researchers, doctors, and other health professionals.

        That is only a small part of the money these transnational corporations use to influence with such things as lobbying to influence not only governments but private organizations like the AHA, ADA, and hospitals in shaping dietary recommendations. The Cleveland Clinic, for example, receives millions of dollars from companies that make breakfast cereals and other grain-based foods — is it surprising that eh Cleveland Clinic recommends eating lots of grains?

        There are numerous books written by scientific journalists about this issue of funding and corruption, bias and bad science: Gary Tubes, Nina Teicholz, Joanna Blythman, and Marion Nestle. There are other writers I could add, but those are the main ones that come to mind. If you read those four, you’ll be well-informed. Even so, most people probably already intuitively know this is true, even if we’ve been taught to mistrust our own common sense.

  3. I’m confused by “subjects who looked at an image of The Thinker had, on average, a 20-point higher belief in God”. Shouldn’t it be lower rather than higher?

Comments are closed.