Now John Bargh’s Famous Hot-Coffee Study Has Failed To Replicate

Ð?Ñ?новнÑ?е RGBBy Jesse Singal

If you Google “holding a warm cup of coffee can” you’ll get a handful of results all telling the same story based on social priming research (essentially the study of how subtle cues affect human thoughts and behavior). “Whether a person is holding a warm cup of coffee can influence his or her views of other people, and a person who has experienced rejection may begin to feel cold,” notes a New York Times blog post, while a Psychology Today article explains that research shows that “holding a warm cup of coffee can make you feel socially closer to those around you.”

These kind of findings are most often associated with John Bargh, a Yale University professor and one of the godfathers of social priming. In his 2017 book Before You Know It: The Unconscious Reasons We Do What We Do, Bargh goes further, even suggesting – based on social priming studies and a small study that found two hours of “hyperthermia” treatment with an infra lamp helped depressed in-patients – that soup might be able to treat depression. “After all,” he writes, “it turns out that a warm bowl of chicken soup really is good for the soul, as the warmth of the soup helps replace the social warmth that may be missing from the person’s life, as when we are lonely or homesick.” He continues, “These simple home remedies are unlikely to make big profits for the pharmaceutical and psychiatric industries, but if the goal is a broader and more general increase in public mental health, some research into their possible helpfulness could pay big dividends for individuals currently in distress, and for society as a whole.”

Of course, it’s a pretty big leap from social-priming studies conducted in labs, or from a small study of a very unusual population, to the idea that maybe chicken soup can rival Prozac — especially when one remembers that millions of clinically depressed people drink hot coffee every day, to no apparent effect. But this nicely captures a common, and important, critique of social priming: Social-priming enthusiasts, both within and outside of academia, have sometimes over-extrapolated on the basis of limited or questionable findings.

Over-extrapolation is one thing, but  what if the original studies simply weren’t even robust in the first place? Recently, a team led by Christopher Chabris and Dan Simons (best known for their “invisible gorilla” work on inattentional blindness), plus other colleagues, sought to replicate two of Bargh’s famous temperature-based findings from 2008. As reported by Research Digest at the time, Bargh and his colleague Lawrence Williams contrived a way to have undergrad participants in a lab hold either a hot or a cold drink and then rate a target individual. Students who held the warm drink rated the individual higher on traits having to do with warmth than students who held the cold drink. In a follow-up study conducted out in the world, Williams and Bargh also found that those “holding a hot (versus cold) therapeutic pad were more likely to choose a gift for a friend instead of for themselves,” the theory here being that a sense of warmth makes people more giving and other-focused.

Chabris and his colleagues attempted to replicate both these findings, following the format of the original research quite closely, but on participant samples that were triple the size of the originals, and which likely more closely resemble the general population (for both experiments, rather than relying on undergrads, they recruited people off the streets of Saratoga Springs, New York). As they report in their preprint available at PsyArXiv, and due to be published in Social Psychology, they found no effects of drink temperature or hot pads on their participants’ judgments or behaviour. Moreover, Chabris and his colleagues used Bayesian analysis (a way of estimating probabilities) to show that “there is substantially more evidence for the null hypothesis of no effect than for the original physical warmth priming hypothesis.”

Of course, as Chabris et al note, this still leaves open the possibility that the hot-coffee and hot-pad effects are real – perhaps they simply failed to observe a significant result because of statistical bad luck. Or maybe the effects are real, but bounded in certain important ways. Maybe they only work in a lab, or only among student participants, or only under certain other conditions met in the first study but not the second. Therein lies the problem for social priming both as a scientific field and as a pop-cultural phenomenon: As failed replications pile up, as they seem to be in this domain (other famous social priming effects that have proven hard to re-create include words related to old-age leading participants to walk more slowly; washing hands cleaning the conscience; and thoughts of money making us selfish), it gets harder and harder to come up with theories that can 1) explain both the original, exciting-seeming findings and the failed replications; and which 2) aren’t incredibly tangled or boring.

I’ll use an extreme example to illustrate the point: Imagine if it really is the case that this particular hot-coffee priming effect only works on college students under a certain age in lab settings, as opposed to out in the world. What would anyone do with that fact? Why would it matter? It certainly couldn’t be justifiably used to make sweeping statements, presumably applicable to humans writ large, about how “Holding a warm cup of coffee can” do X or Y (let alone that hot soup can cure depression).

But regrettably that’s how these sorts of findings have all been too often communicated: “Holding a warm cup of coffee can make you feel warmer toward others!” And so on. It’s much rarer for social-priming findings — or other sexy research findings from behavioural science — to be presented in an appropriately hedged, scientifically responsible way. That wouldn’t be exciting enough.

No Evidence that Experiencing Physical Warmth Promotes Interpersonal Warmth: Two Failures to Replicate Williams and Bargh (2008)

Post written by Jesse Singal (@JesseSingal) for the BPS Research Digest. Jesse is a contributing writer at BPS Research Digest and New York Magazine. He is working on a book about why shoddy behavioral-science claims sometimes go viral for Farrar, Straus and Giroux.

Update March 13, 2019: Read a response to this blog post by Professor John Bargh, published in the comments.

11 thoughts on “Now John Bargh’s Famous Hot-Coffee Study Has Failed To Replicate”

  1. The funny thing about the social priming research, is that even if it were right, it’s completely irrelevant to the questions we should be asking, namely “how is it that people think”. Andrew Gelman puts this differently, pointing out that these are tiny effects that in real life would be in competition with each other (e.g. what happens when you see a muscle builder showing off his biceps at the bar next to a senior citizen), and thus even less relevant to how real humans actually behave.

  2. Reading Kahneman’s “Thinking Fast and Slow” right now and I almost want to stop due to his cheerleading for priming. I know he admitted on a blog post that he was totally wrong but I think a new edition is needed now. Also, I’m happy to hear that the priming experiments, which always seemed unbelievable to me turn out to be statistically unsound.

  3. Did they control for how comfortable the subjects were in the social setting before holding the warm cofffee? Graduates with a shared experience against those recruited from the street?

  4. To Jesse Singal, in response

    Let me begin by saying that I appreciate your social science coverage, which I have followed in the past. It’s important that the public be aware of the strength of the evidence behind the claims that psychologists make, and science journalism plays a vital role in delivering that information via objective, outside perspectives.

    Theory and plausibility of the physical-to-social warmth effect:

    Before getting to the actual data (most of it from other labs), let’s talk theory. The core idea behind the ‘coffee study’ is that low-level sensory experiences can activate abstract concepts, which can in turn influence judgment and decision making. I submit that this idea is, a priori, quite plausible, both in light of psychological research and everyday experience. Consider, for instance, the power of scents to activate memories and the abstract concepts associated with them. The scent of a pine tree may remind one of childhood Christmases, and (abstract) feelings of excitement (racing down the stairs to see what Santa brought), love (of parents and siblings there together with you), comfort, peace, and so on. For me personally, a ‘Beach Party’ Yankee Candle never fails to bring up memories of my (too few) trips to Hawaii, along with (abstract) notions of beauty and relaxation and emotions such as awe. In childhood especially, abstract concepts such as trust and caring come to be associated, quite naturally, with sensory experiences like a soft fuzzy blanket or the warmth of being held close. My lab wasn’t the first to notice this, of course. Both John Bowlby in his ground-breaking attachment theory and George Lakoff in his influential metaphor-language theory made the same point. More recently, the notion that abstract concepts are linked to physical sensations has inspired some of the most exciting and influential research in cognitive psychology and theoretical neuroscience, spearheaded by the likes of Lawrence Barsalou, Anil Seth, Lisa Feldman Barrett, and Karl Friston.

    So while the basic idea underlying the ‘coffee study’ seems uncontroversial – that sensory, physical experiences can activate abstract mental concepts – it can nonetheless generate some very interesting, counterintuitive predictions, such as the one we explored in the ‘coffee study’. Still, even here the idea remains plausible. Bowlby argued that all mammals who breastfeed their young come to associate, over evolutionary time periods, the experience of physical warmth from the mother’s body, and ‘social warmth’ – trust, bonding – because those two experiences are repeatedly experienced together. Harlow argued the same thing in his later ‘monkeys raised in isolation’ studies. This is just basic associative logic that underlies so much of our ‘learning by experience’ in the real world.

    Subsequent replications and neuroscience discoveries:

    This brings me to the data. Although Bowlby’s theory and Solomon Asch’s first-ever impression formation study were the bases of our prediction in the coffee study, since our study was published there has been a considerable amount of further research, especially neuroscience, brain imaging research. And all of the following studies are described in my book. First, IJzerman and Semin conceptually replicated our coffee study findings: after holding a warm beverage, participants reported feeling closer to other people compared to participants who had just held a cold beverage. Then, several brain imaging studies – much of it from the Naomi Eisenberger lab at UCLA – directly supported our hypothesis of an actual anatomical link in human insula between the region reactive to physical warmth and a nearby region reactive to social warmth. In their several studies, either type of warmth activated both regions. When participants held something warm, or when they texted family and friends, the same insular region became active. This is pretty strong support for our hypothesized link between physical and social warmth. And it comes from a much more powerful method than the one we originally used, which after all was Solomon Asch’s original impression-formation-study method from 70 years ago.

    And there are several other supporting studies described later on – a UCLA hospital-based study in which nurses took the participants’ body temperatures hourly over the course of a full day, which covaried with patients’ hourly ratings of how close they currently felt to family and friends. The higher their body temperature, the closer they reported being to their significant others. The study showing that social rejection leads to a greater preference for warm foods such as soup for lunch, compared to a control condition, is also described. In Hans IJzerman’s study using a precision body temperature measurement device, participants’ actual body temperatures fell or rose about .7 of a degree Fahrenheit following an actual social experience of rejection or inclusion.

    The effect in real life. Outside the lab, and inside the hospital:

    There are also very recent studies that used state of the art ‘daily diary’ methods by Adam Fetterman and colleagues, in which people were paged on their smartphones outside the lab going about their daily life, showing that the more physically warm experiences a person reports in a given day, the more socially warm experiences and feelings they have too.

    The study that you did mention was the clinical ‘heat lamp’ intervention study, but you dismissed it as ‘small’. You are correct about the sample size, but it should be noted that the population is also small and difficult to access – patients so debilitatingly depressed that they had to be hospitalized for it. Yet the researchers found that a long session under the heat lamp was actually effective in reducing their suffering for the next two weeks (at least). There is no question that larger sample sizes for psychology studies are a good thing, but controlled clinical treatment intervention studies in hospitals cannot possibly have the same large samples. The alternative is to not do these studies at all – new treatments would not be discovered, and depressed patients would continue to suffer more than they should.

    What about Chabris and Simons?

    If the physical-to-social warmth effect is plausible and replicable, why didn’t Chabris and Simons (C&S) find evidence for it? There are many possibilities. First, it is possible that, despite the available evidence, the physical-to-social warmth effect is not real. I consider this highly unlikely, but not impossible. Second, it is possible that the physical-to-social warmth effect is real, but only a subset of methods is capable of finding it reliably. For instance, it is possible that the studies by Ijzerman, Semin, Eisenberger, etc. all replicate, but the ‘coffee study’ does not. Third, it is possible that there were key differences between the C&S study procedure and the Williams and Bargh (W&B) study procedure that are responsible for the diverging findings.

    One potentially important difference in procedure is the temperature of the hot cup of coffee that participants held: was the coffee piping hot (so that it was somewhat uncomfortable to hold) or warm (so that it was pleasant to hold)? If the coffee was piping hot, then, according to the theory that motivated W&B, it should not activate the concept of social warmth – a positively valenced, pleasant concept. (“Hot” is not the same as just more “warm”, and actually participates in a quite different metaphor – hot vs. cool – having to do with emotionality.) If anything, an uncomfortably hot cup of coffee might be expected to activate the concept of anger (“hot-headedness”), which is antithetical to social warmth. With this in mind, there are good reasons to suspect that in C&S, the coffee was, for many participants, uncomfortably hot. Indeed, C&S purchased a hot or cold coffee at a coffee shop and then immediately handed that coffee to passersby who volunteered to take the study. Thus, the first few people to hold a hot coffee likely held a piping hot coffee (in contrast, W&B’s coffee shop was several blocks away from the site of the experiment, and they used a microwave for subsequent participants to keep the coffee at a pleasantly warm temperature). Importantly, C&S handed the same cup of coffee to as many as 7 participants before purchasing a new cup. Because of that feature of their procedure, we can check if the physical-to-social warmth effect emerged after the cups were held by the first few participants, at which point the hot coffee (presumably) had gone from piping hot to warm.

    Please view the graph here [Editor’s note: unfortunately it’s not possible to embed in the comment]:

    If you look at C&S’s data on the effect of hot vs. cold coffee on ratings of social warmth, broken down by the number of people who previously held that cup of coffee (from 0 to 6 on the x-axis), among the first 3 participants to hold a newly purchased hot or cold cup of coffee (0, 1, and 2 on the x-axis), no physical-to-social-warmth effect emerges. However, among the 4th and 5th participants to hold a cup (3 and 4 on the x-axis), those who held the hot coffee rated the social target as warmer compared to participants who held the cold coffee (this effect is not statistically significant, but close: p = .08). The effect vanishes again among the final two participants to hold the hot or cold cup. These data are consistent with the idea that, as long as the hot coffee was in the pleasantly warm range (3 and 4 on the x-axis), as it was in the W&B study, it activated the concept of social warmth and influenced social judgments accordingly (replicating W&B). But if the coffee was piping hot (0, 1, and 2 on the x-axis) or room temperature (5 and 6 on the x-axis), no such effect emerged.

    Of course, this is a post hoc story that would have to be tested. To this end, we have been in touch with C&S about collecting subjective ratings of the temperature of the coffee served at the shop where they ran the study. We’d like to see how long it takes for their coffee to go from uncomfortably hot to pleasantly warm to room temperature, and to see if that timing matches up with our hypothesis. Also, we have offered to conduct a pre-registered, collaborative study with C&S to directly test the alternative explanation I’ve raised here.

    The main point here is that C&S’s particular failure to replicate is hardly the final word on reality of the physical-to-social warmth effect, given the mass of other evidence, nor does it call for an immediate, dramatic decrease in confidence in the effect.

    Thank you:

    Let me close by affirming that I share your goal of presenting the public with accurate information as to the state of the scientific evidence on any finding I discuss publicly. I also in good faith seek to give my best advice to the public at all times, again based on the present state of evidence. Your and my assessments of that evidence might differ, but our motivations are the same.

    John Bargh
    James Rowland Angell Professor of Psychology
    Professor of Management
    Yale University

  5. The post mentions John A. Bargh’s book “Before you know it” that contains many claims based on results published in peer-reviewed psychology journals.

    Since 2011 it has been public knowledge that these results cannot be trusted at face value because they are selected to confirm predictions. This explains the astonishing 95% success rate (Sterling, 1959).

    I have conducted a bias-corrected analysis of the claims in Bargh’s book and found that many of these claims rest on shaky empirical foundations. Thus, it is not surprising that this famous study also did not replicate.

    It is also well-known that replicability is a function of the p-value of a study. The famous coffee study was just significant, p = .05. Just a slightly different result would have made it inconclusive, p = .06. These results hardly ever replicate because they are typically obtained with the help of chance..

    Even if the original study correctly showed that there is an effect in the predicted direction, the effect size could be tiny and not relevant to talk about. This is the problem of the small sample studies Bargh is known for. They can never tell us whether the effects have practical significance.

    Bargh replied and stated that he aims to base his claims on the best evidence that is available. However, simply pointing to several studies with statistically significant results is not good enough because we don’t know how many attempts were made to get these results with negative results.

    As Kahneman pointed out in an open letter in 2012, the best way to address concerns about this research would be to conduct credible replication studies with good power.

    A century ago, Fisher pointed out that a single p-value less than .05 does not prove a theory. A good study will produce significant results most of the time. Unless we see consistent replications, the warmth-effect lacks empirical support.

Comments are closed.