Can we trust psychological studies? We speak to Brian Earp, of Oxford University and Yale University, about how to respond when we’re told repeatedly that the veracity of eye-catching findings, or even cherished theories, has come under scrutiny. Brian also talks about his own experience of publishing a failed replication attempt – a must-listen for any researchers who are fearful of publishing their own negative findings. Find Brian on Twitter @BrianDavidEarp
By Alex Fradera
“Reading is the sole means by which we slip, involuntarily, often helplessly, into another’s skin, another’s voice, another’s soul.” So said Joyce Carol Oates, and many more of us suspect that reading good fiction gives us insight into other people.
Past research backs this up, for example providing evidence that people with a long history of reading tend to be better at judging the mental states of others. But this work has always been open to the explanation that sensitive people are drawn to books, rather than books making people more sensitive. However in 2013 a study came along that appeared to change the game: researchers David Kidd and Emanuele Castano showed that exposure to a single passage of literary fiction actually improved readers’ ability to identify other people’s feelings.
This finding sent ripples through popular media, even prompting some to suggest strategies for everyday life like leafing through a book before you go on a date. But since then, as is the usual pattern in psychology these days, a struggle has ensued to establish the robustness of the eye-catching 2013 result. Continue reading “Three labs just failed to replicate the finding that a quick read of literary fiction boosts your empathy”
Every now and again a psychology finding is published that immediately grabs the world’s attention and refuses to let go – often it’s a result with immediate implications for how we can live more happily and peacefully, or it says something profound about human nature. Said finding then enters the public consciousness, endlessly recycled in pop psychology books and magazine articles.
Unfortunately, sometimes when other researchers have attempted to obtain these same influential findings, they’ve struggled. This replication problem doesn’t just apply to famous findings, nor does it only affect psychological science. And there can be relatively mundane reasons behind failed replications, such as methodological differences from the original or cultural changes since the original was conducted.
But given the public fascination with psychology, and the powerful influence of certain results, it is arguably in the public interest to summarise in one place a collection of some of the most famous findings that have proven tricky to repeat. This is not a list of disproven or dodgy results. It’s a snapshot of the difficult, messy process of behavioural science. Continue reading “Ten Famous Psychology Findings That It’s Been Difficult To Replicate”
The great American psychologist William James proposed that bodily sensations – a thumping heart, a sweaty palm – aren’t merely a consequence of our emotions, but may actually cause them. In his famous example, when you see a bear and your pulse races and you start running, it’s the running and the racing pulse that makes you feel afraid.
Consistent with James’ theory (and similar ideas put forward even earlier by Charles Darwin), a lot of research has shown that the expression on our face seems not only to reflect, but also to shape how we’re feeling. One of the most well-known and highly cited pieces of research to support the “facial feedback hypothesis” was published in 1988 and involved participants looking at cartoons while holding a pen either between their teeth, forcing them to smile, or between their lips, forcing them to pout. Those in the smile condition said they found the cartoons funnier.
But now an attempt to replicate this modern classic of psychology research, involving 17 labs around the world and a collective subject pool of 1894 students, has failed. “Overall, the results were inconsistent with the original result,” the researchers said. Continue reading “No reason to smile – Another modern psychology classic has failed to replicate”
Being watched encourages us to be nicer people – what psychologists call behaving “pro-socially”. Recent evidence has suggested this effect can even be driven by artificial surveillance cues, such as eyes pictured on-screen or painted on a donations jar. If true, this would offer up some simple ways to reduce low-level crime and, well, to encourage us all to treat each other a little better. But unfortunately, a new article in Evolution and Human Behavior, calls this into question. Continue reading “Two meta-analyses find no evidence that “Big Brother” eyes boost generosity”
Pick up any introductory psychology textbook and under the “developmental” chapter you’re bound to find a description of “groundbreaking” research into newborn babies’ imitation skills. The work, conducted in the 1970s, will typically be shown alongside black and white images of a man sticking his tongue out at a baby, and the tiny baby duly sticking out her tongue in response.
The research was revolutionary because it appeared to show that humans are born with the power to imitate – a skill crucial to learning and relationships – and it contradicted the claims of Jean Piaget, the grandfather of developmental psychology, that imitation does not emerge until babies are around nine months old.
Today it may be time to rewrite these textbooks. A new study in Current Biology, more methodologically rigorous than any previous investigation of its kind, has found no evidence to support the idea that newborn babies can imitate.
Janine Oostenbroek and her colleagues tested 106 infants four times: at one week of age, then at three weeks, six weeks, and nine weeks. Data from 64 of the infants was available at all four time points. At each test, the researcher performed a range of facial movements, actions or sounds for 60 seconds each. There were 11 of these displays in total, including tongue protrusions, mouth opening, happy face, sad face, index finger pointing and mmm and eee sounds. Each baby’s behaviour during these 60-second periods was filmed and later coded according to which faces, actions or sounds, if any, he or she performed during the different researcher displays.
Whereas many previous studies have compared babies’ responses to only two or a few different adult displays, this study was much more robust because the researchers checked to see if, for example, the babies were more likely to stick out their tongues when that’s what the researcher was doing, as compared with when the researcher was doing any of the 10 other displays or sounds. Unlike most prior research, this new study also looked to see how any signs of imitation changed over time, at the different testing sessions. According to the researchers, this makes theirs “the most comprehensive, longitudinal study of neonatal imitation to date”.
Following these more robust standards, Oostenbroek and her team found no evidence that newborn babies can reliably imitate faces, actions or sounds. For example, let’s take the example of tongue protrusions. Averaged across the different testing time points, the babies were no more likely to stick out their tongue when the researcher did so, as compared with the researcher opened her mouth, pulled a happy face or pulled a sad face. In fact, across all the different displays, actions and sounds, there was no situation in which the babies consistently performed a given facial display, gesture or sound more when the researcher specifically did that same thing, than when the researcher was doing anything else.
Based on their results, the researchers said that the idea of “innate imitation modules” and other such concepts founded on the ideal of neonatal imitation “should be modified or abandoned altogether”. They said the truth may be closer to what Piaget originally proposed and that imitation probably emerges from around 6 months.
Oostenbroek, J., Suddendorf, T., Nielsen, M., Redshaw, J., Kennedy-Costantini, S., Davis, J., Clark, S., & Slaughter, V. (2016). Comprehensive Longitudinal Study Challenges the Existence of Neonatal Imitation in Humans Current Biology DOI: 10.1016/j.cub.2016.03.047
Top image is part of a figure that appears in Oostenbroek et al. 2016.
10 surprising things babies can do
Our free weekly email will keep you up-to-date with all the psychology research we digest: Sign up!
|While 97 per cent of the original results showed a statistically significant
effect, this was reproduced in only 36 per cent of the replications
After some high-profile and at times acrimonious failures to replicate past landmark findings, psychology as a discipline and scientific community has led the way in trying to find out more about why some scientific findings reproduce and others don’t, including instituting reporting practices to improve the reliability of future results. Much of this endevour is thanks to the Center for Open Science, co-founded by the University of Virginia psychologist Brian Nosek.
Today, the Center has published its latest large-scale project: an attempt by 270 psychologists to replicate findings from 100 psychology studies published in 2008 in three prestigious journals that cover cognitive and social psychology: Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory and Cognition.
The Reproducibility Project is designed to estimate the “reproducibility” of psychological findings and complements the Many Labs Replication Project which published its initial results last year. The new effort aimed to replicate many different prior results to try to establish the distinguishing features of replicable versus unreliable findings: in this sense it was broad and shallow and looking for general rules that apply across the fields studied. By contrast, the Many Labs Project involved many different teams all attempting to replicate a smaller number of past findings – in that sense it was narrow and deep, providing more detailed insights into specific psychological phenomena.
The headline result from the new Reproducibility Project report is that whereas 97 per cent of the original results showed a statistically significant effect, this was reproduced in only 36 per cent of the replication attempts. Some replications found the opposite effect to the one they were trying to recreate. This is despite the fact that the Project went to incredible lengths to make the replication attempts true to the original studies, including consulting with the original authors.
Just because a finding doesn’t replicate doesn’t mean the original result was false – there are many possible reasons for a replication failure, including unknown or unavoidable deviations from the original methodology. Overall, however, the results of the Project are likely indicative of the biases that researchers and journals show towards producing and publishing positive findings. For example, a survey published a few years ago revealed the questionable practices many researchers use to achieve positive results, and it’s well known that journals are less likely to publish negative results.
The Project found that studies that initially reported weaker or more surprising results were less likely to replicate. In contrast, the expertise of the original research team or replication research team were not related to the chances of replication success. Meanwhile, social psychology replications were less than half as likely to achieve a significant finding compared with cognitive psychology replication attempts, but in terms of declines in size of effect, both fields showed the same average reduction from original study to replication attempt, to less than half (cognitive psychology studies started out with larger effects and this is why more of the replications in this area retained statistical significance).
Among the studies that failed to replicate was research on loneliness increasing supernatural beliefs; conceptual fluency increasing a preference for concrete descriptions (e.g. if I prime you with the name of a city, that increases your conceptual fluency for the city, which supposedly makes you prefer concrete descriptions of that city); and research on links between people’s racial prejudice and their response times to pictures showing people from different ethnic groups alongside guns. A full list of the findings that the researchers attempted to replicate can be found on the Reproducibility Project website (as can all the data and replication analyses).
This may sound like a disappointing day for psychology, but in fact really the opposite is true. Through the Reproducibility Project, psychology and psychologists are blazing a trail, helping shed light on a problem that afflicts all of science, not just psychology. The Project, which was backed by the Association for Psychological Science (publisher of the journal Psychological Science), is a model of constructive collaboration showing how original authors and the authors of replication attempts can work together to further their field. In fact, some investigators on the Project were in the position of being both an original author and a replication researcher.
“The present results suggest there is room to improve reproducibility in psychology,” the authors of the Reproducibility Project concluded. But they added: “Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should” – that is, being constantly sceptical of its own explanatory claims and striving for improvement. “This isn’t a pessimistic story”, added Brian Nosek in a press conference for the new results. “The project shows science demonstrating an essential quality, self-correction – a community of researchers volunteered their time to contribute to a large project for which they would receive little individual credit.”
Open Science Collaboration (2015). Estimating the reproducibility of psychological science Science
How did it feel to be part of the Reproducibility Project?
A replication tour de force
Do psychology findings replicate outside the lab?
A recipe for (attempting to) replicate existing findings in psychology
A special issue of The Psychologist on issues surrounding replication in psychology.
Serious power failure threatens the entire field of neuroscience
Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.
Our free fortnightly email will keep you up-to-date with all the psychology research we digest: Sign up!
By guest blogger Neuroskeptic
A widely-used brain stimulation technique may be less effective than previously believed.
Transcranial Direct Current Stimulation (tDCS) is an increasingly popular neuroscience tool. tDCS involves attaching electrodes to the scalp, through which a weak electrical current flows. The idea is that this current modulates the activity of the brain tissue underneath the electrode – safely and painlessly.
Outside of the neuroscience lab, tDCS is also used by hobbyists looking to boost their own brain power and a number of consumer stimulation devices are now being sold. The technique regularly makes the news, under headlines such as “Zapping your brain could help you lose weight”.
However, according to Australian neuroscientists Jared Horvath, Jason Forte and Olivia Carter, a single session of tDCS may have no detectable effect on cognitive function in most people. In a new paper published in the journal Brain Stimulation, Horvath and colleagues reviewed the published evidence on tDCS. They performed a meta-analysis of the data on how tDCS influences cognitive functions such as memory, language, and mental arithmetic.
For example, in experiments investigating language function, neuroscientists generally place the active tDCS electrode over the left frontal lobe of the volunteers. This ensures that the electrode is near to Broca’s area, a part of the brain known to be involved in language production. Then, the current is switched on and the volunteer is asked to do a linguistic task such as verbal fluency, in which the goal is to think of as many words beginning with a certain letter (say “p”) as possible within one minute. The performance of the volunteers given tDCS is compared to the performance of people given “sham” tDCS, in which the electrodes are attached but no current is applied.
Horvath et al. found that overall, there was no statistically significant difference between active and sham tDCS on any of the cognitive tasks that they examined. They say that:
Of the 59 analyses undertaken, tDCS was not found to generate a significant effect on any. Taken together, the evidence does not support the assertion that a single-session of tDCS has a reliable effect on cognitive tasks in healthy adult populations.
That seems pretty clear-cut. However, Horvath et al. acknowledge that their analysis did not include any of the studies that have been conducted on individuals with brain diseases or on the elderly, and they note that tDCS might be more effective in such cases.
What’s more, Horvath et al.’s meta-analysis didn’t utilize all of the studies on healthy people. The authors decided to only include results that had at least one published independent replication attempt. In other words, they only included studies that had measured the effects of tDCS on a given cognitive task, if more than one different research group had published papers using that technique. Even if one team of scientists had published several studies all showing that tDCS does influence some aspect of cognition, those results weren’t included unless at least one other team of researchers had published tDCS results using that same task. One hundred and seventy-six articles were excluded as a result.
Horvath et al. explain their decision not to consider those studies by saying that:
We chose to exclude measures that have only been replicated by a single research group to ensure all data included in and conclusions generated by this review accurately reflect the effects of tDCS itself, rather than any unique device, protocol, or condition utilized in a single lab.
However, this is a slightly unusual restriction to use on a meta-analysis. It might be interesting to see whether including these additional studies would have changed the results.
This is the second time Horvath, Forte and Carter have published a sceptical meta-analysis of tDCS. In November last year they reviewed studies on the neurophysiological effects of tDCS and concluded that tDCS has virtually no measurable effects on brain function. So Horvath et al. seem to have comprehensively shown that tDCS essentially has no impact in healthy people, either on a biological or on a cognitive level.
This is a really useful review, as it helps us to think about the way we talk about the effects of tDCS.
However I believe that the way the analysis was conducted may have obscured some of the very real effects of tDCS. The authors have made a judgement about which studies can be pooled together and which studies cannot be pooled. One always has to make these kinds of decisions and I am not sure I would have made the same decisions given the same choices.
tDCS is still a developing technology. I think that with more principled methods of targeting the current flow to the desired brain area, we will see tDCS become one of the standard tools of cognitive neuroscience, just as EEG and fMRI have become.
Horvath, J., Forte, J., & Carter, O. (2015). Quantitative Review Finds No Evidence of Cognitive Effects in Healthy Populations from Single-Session Transcranial Direct Current Stimulation (tDCS) Brain Stimulation DOI: 10.1016/j.brs.2015.01.400
Post written for the BPS Research Digest by Neuroskeptic, a British neuroscientist who blogs for Discover Magazine.
In his famous 1974 lecture, Cargo Cult Science, Richard Feynman recalls his experience of suggesting to a psychology student that she should try to repeat a previous experiment before attempting a novel one:
“She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happened.”
Despite the popularity of the lecture, few took his comments about lack of replication in psychology seriously – and least of all psychologists. Another 40 years would pass before psychologists turned a critical eye on just how often they bother to replicate each other’s experiments. In 2012, US psychologist Matthew Makel and colleagues surveyed the top 100 psychology journals since 1900 and estimated that for every 1000 papers published, just two sought to closely replicate a previous study. Feynman’s instincts, it seems, were spot on.
Now, after decades of the status quo, psychology is finally coming to terms with the idea that replication is a vital ingredient in the recipe of discovery. The latest issue of the journal Social Psychology reports an impressive 15 papers that attempted to replicate influential findings related to personality and social cognition. Are men really more distressed by infidelity than women? Does pleasant music influence consumer choice? Is there an automatic link between cleanliness and moral judgements?
|Many supposedly ‘classic’ effects could not be found|
Several phenomena replicated successfully. An influential finding by Stanley Schacter from 1951 on ‘deviation rejection’ was successfully repeated by Eric Wesselman and colleagues. Schacter had originally found that individuals whose opinions persistently deviate from a group norm tend to be disempowered by the group and socially isolated. Wesselman replicated the result, though finding that it was smaller than originally supposed.
On the other hand, many supposedly ‘classic’ effects could not be found. For instance, there appears to be no evidence that making people feel physically warm promotes social warmth, that asking people to recall immoral behaviour makes the environment seem darker, or for the Romeo and Juliet effect.
The flagship of the special issue is the Many Labs project, a remarkable effort in which 50 psychologists located in 36 labs worldwide collaborated to replicate 13 key findings, across a sample of more than 6000 participants. Ten of the effects replicated successfully.
Adding further credibility to this enterprise, each of the studies reported in the special issue was pre-registered and peer reviewed before the authors collected data. Study pre-registration ensures that researchers adhere to the scientific method and is rapidly emerging as a vital tool for increasing the credibility and reliability of psychological science.
The entire issue is open access and well worth a read. I think Feynman would be glad to see psychology leaving the cargo cult behind and, for that, psychology can be proud too.
– Further reading: A special issue of The Psychologist on issues surrounding replication in psychology.
Klein, R., Ratliff, K., Vianello, M., Adams, Jr., R., Bahník, Bernstein, M., Bocian, K., Brandt, M., Brooks, B., Brumbaugh, C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E., Hasselman, F., Hicks, J., Hovermale, J., Hunt, S., Huntsinger, J., IJzerman, H., John, M., Joy-Gaba, J., Kappes, H., Krueger, L., Kurtz, J., Levitan, C., Mallett, R., Morris, W., Nelson, A., Nier, J., Packard, G., Pilati, R., Rutchick, A., Schmidt, K., Skorinko, J., Smith, R., Steiner, T., Storbeck, J., Van Swol, L., Thompson, D., van ’t Veer, A., Vaughn, L., Vranka, M., Wichman, A., Woodzicka, J., & Nosek, B. (2014). Data from Investigating Variation in Replicability: A “Many Labs” Replication Project Journal of Open Psychology Data, 2 (1) DOI: 10.5334/jopd.ad
Post written for the BPS Research Digest by guest host Chris Chambers, senior research fellow in cognitive neuroscience at the School of Psychology, Cardiff University, and contributor to the Guardian psychology blog, Headquarters.
“Out, damned spot!” cries a guilt-ridden Lady Macbeth as she desperately washes her hands in the vain pursuit of a clear conscience. Consistent with Shakespeare’s celebrated reputation as an astute observer of the human psyche, a wealth of contemporary research findings have demonstrated the reality of this close link between our sense of moral purity and physical cleanliness.
One manifestation of this was nicknamed the Macbeth Effect – first documented by Chen-Bo Zhong and Katie Liljenquist in an influential paper in the high-impact journal Science in 2006 – in which feelings of moral disgust were found to provoke a desire for physical cleansing. For instance, in their second study, Zhong and Liljenquist found that US participants who hand-copied a story about an unethical deed were subsequently more likely to rate cleansing products as highly desirable.
There have been many “conceptual replications” of the Macbeth Effect. A conceptual replication is when a different research methodology supports the proposed theoretical mechanism underlying the original effect. For example, last year, Mario Gollwitzer and André Melzer found that novice video gamers showed a strong preference for hygiene products after playing a violent game.
Given the strong theoretical foundations of the Macbeth Effect, combined with several conceptual replications, University of Oxford psychologist Brian Earp and his colleagues were surprised when a pilot study of theirs failed to replicate Zhong and Liljenquist’s second study. This pilot study had been intended as the start of a new project looking to further develop our understanding of the Macbeth Effect. Rather than filing away this negative result, Earp and his colleagues were inspired to examine the robustness of the Macbeth Effect with a series of direct replications. Unlike conceptual replications, direct replications seek to mimic the methods of an original study as closely as possible.
Following best practice guidelines, Earp’s team contacted Zhong and Liljenquist, who kindly shared their original materials. Another feature of a high-quality replication is to ensure you have enough statistical power to replicate the original effect. In psychology, this usually means recruiting an adequate number of participants. Accordingly, Earp’s team recruited 153 undergrad participants – more than five times as many as took part in Zhong and Liljenquist’s second study.
Exactly as in the original research, the British students hand-copied a story about an unethical deed (an office worker shreds a vital document needed by a colleague) or about an ethical deed (the office worker finds and saves the document for their colleague). They then rated the desirability and value of several consumer products. These were the exact same products used in the original study – including soap, toothpaste, batteries and fruit juice – except that a few brand names were changed to suit the UK as opposed to US context. Students who copied the unethical story rated the desirability and value of the various hygiene and other products just the same as the students who copied the ethical story. In other words, there was no Macbeth Effect.
It’s possible that the Macbeth Effect is a culturally specific phenomenon. Next, Earp and his team conducted a replication attempt with 156 US participants using Amazon’s Mechanical Turk survey website. The materials and methods were almost identical to the original except that participants were required to re-type and add punctuation to either the ethical or unethical version of the office worker story. Again, exposure to the unethical story made no difference to the participants’ ratings of the value or desirability of the consumer products – with just one anomaly. Participants in the unethical condition placed a higher value on toothpaste. In the context of their other findings, Earp’s team think this is likely a spurious result.
Finally, the exact same procedures were followed with an Indian sample – another culture, that like the US, places high value on moral purity. Nearly three hundred Indian participants were recruited via Amazon’s Mechanical Turk, but again no effect of exposure to an ethical or unethical story was found on ratings of hygiene or other products.
Earp and his colleagues want to be clear – they’re not saying that there is no link between physical and moral purity, nor are they dismissing the existence of a Macbeth Effect. But they do believe their three direct, cross-cultural replication failures call for a “careful reassessment of the evidence for a real-life ‘Macbeth Effect’ within the realm of moral psychology.”
This study, due for publication next year, comes at time when reformers in psychology are calling for more value to be placed on replication attempts and negative results. “By resisting the temptation … to bury our own non-significant findings with respect to the Macbeth Effect, we hope to have contributed a small part to the ongoing scientific process,” Earp and his colleagues concluded.
Brian D. Earp, Jim A. C. Everett, Elizabeth N. Madva, and J. Kiley Hamlin (2014). Out, damned spot: Can the “Macbeth Effect” be replicated? Basic and Applied Social Psychology, In Press.
— Further reading —
An unsuccessful conceptual replication of the Macbeth Effect was published in 2009 (pdf). Later, in 2011, another paper failed to replicate all four of Zhong and Liljenquist’s studies, although the replications may have been underpowered.
From the Digest archive: Your conscience really can be wiped clean. Feeling clean makes us harsher moral judges.
See also: Psychologist magazine special issue on replications.
Christian Jarrett (@Psych_Writer) is Editor of BPS Research Digest