Not so easy to spot: A failure to replicate the Macbeth Effect across three continents

“Out, damned spot!” cries a guilt-ridden Lady Macbeth as she desperately washes her hands in the vain pursuit of a clear conscience. Consistent with Shakespeare’s celebrated reputation as an astute observer of the human psyche, a wealth of contemporary research findings have demonstrated the reality of this close link between our sense of moral purity and physical cleanliness.

One manifestation of this was nicknamed the Macbeth Effect – first documented by Chen-Bo Zhong and Katie Liljenquist in an influential paper in the high-impact journal Science in 2006 – in which feelings of moral disgust were found to provoke a desire for physical cleansing. For instance, in their second study, Zhong and Liljenquist found that US participants who hand-copied a story about an unethical deed were subsequently more likely to rate cleansing products as highly desirable.

There have been many “conceptual replications” of the Macbeth Effect. A conceptual replication is when a different research methodology supports the proposed theoretical mechanism underlying the original effect. For example, last year, Mario Gollwitzer and André Melzer found that novice video gamers showed a strong preference for hygiene products after playing a violent game.

Given the strong theoretical foundations of the Macbeth Effect, combined with several conceptual replications, University of Oxford psychologist Brian Earp and his colleagues were surprised when a pilot study of theirs failed to replicate Zhong and Liljenquist’s second study. This pilot study had been intended as the start of a new project looking to further develop our understanding of the Macbeth Effect. Rather than filing away this negative result, Earp and his colleagues were inspired to examine the robustness of the Macbeth Effect with a series of direct replications. Unlike conceptual replications, direct replications seek to mimic the methods of an original study as closely as possible.

Following best practice guidelines, Earp’s team contacted Zhong and Liljenquist, who kindly shared their original materials. Another feature of a high-quality replication is to ensure you have enough statistical power to replicate the original effect. In psychology, this usually means recruiting an adequate number of participants. Accordingly, Earp’s team recruited 153 undergrad participants – more than five times as many as took part in Zhong and Liljenquist’s second study.

Exactly as in the original research, the British students hand-copied a story about an unethical deed (an office worker shreds a vital document needed by a colleague) or about an ethical deed (the office worker finds and saves the document for their colleague). They then rated the desirability and value of several consumer products. These were the exact same products used in the original study – including soap, toothpaste, batteries and fruit juice – except that a few brand names were changed to suit the UK as opposed to US context. Students who copied the unethical story rated the desirability and value of the various hygiene and other products just the same as the students who copied the ethical story. In other words, there was no Macbeth Effect.

It’s possible that the Macbeth Effect is a culturally specific phenomenon. Next, Earp and his team conducted a replication attempt with 156 US participants using Amazon’s Mechanical Turk survey website. The materials and methods were almost identical to the original except that participants were required to re-type and add punctuation to either the ethical or unethical version of the office worker story. Again, exposure to the unethical story made no difference to the participants’ ratings of the value or desirability of the consumer products – with just one anomaly. Participants in the unethical condition placed a higher value on toothpaste. In the context of their other findings, Earp’s team think this is likely a spurious result.

Finally, the exact same procedures were followed with an Indian sample – another culture, that like the US, places high value on moral purity. Nearly three hundred Indian participants were recruited via Amazon’s Mechanical Turk, but again no effect of exposure to an ethical or unethical story was found on ratings of hygiene or other products.

Earp and his colleagues want to be clear – they’re not saying that there is no link between physical and moral purity, nor are they dismissing the existence of a Macbeth Effect. But they do believe their three direct, cross-cultural replication failures call for a “careful reassessment of the evidence for a real-life ‘Macbeth Effect’ within the realm of moral psychology.”

This study, due for publication next year, comes at time when reformers in psychology are calling for more value to be placed on replication attempts and negative results. “By resisting the temptation … to bury our own non-significant findings with respect to the Macbeth Effect, we hope to have contributed a small part to the ongoing scientific process,” Earp and his colleagues concluded.


Brian D. Earp, Jim A. C. Everett, Elizabeth N. Madva, and J. Kiley Hamlin (2014). Out, damned spot: Can the “Macbeth Effect” be replicated? Basic and Applied Social Psychology, In Press.

— Further reading —
An unsuccessful conceptual replication of the Macbeth Effect was published in 2009 (pdf). Later, in 2011, another paper failed to replicate all four of Zhong and Liljenquist’s studies, although the replications may have been underpowered. 

From the Digest archive: Your conscience really can be wiped cleanFeeling clean makes us harsher moral judges.

See also: Psychologist magazine special issue on replications.

Christian Jarrett (@Psych_Writer) is Editor of BPS Research Digest

2 thoughts on “Not so easy to spot: A failure to replicate the Macbeth Effect across three continents”

