By guest blogger Jon Brock
Johannes Eichstaedt was sitting in a coffee shop by Lake Atitlan in Guatemala when he received a slack about a tweet about a preprint. In 2015, the University of Pennsylvania psychologist and his colleagues published a headline-grabbing article linking heart disease to the language used on Twitter. They’d found that tweets emanating from US counties with high rates of heart disease mortality tended to exhibit high levels of negative emotions such as anger, anxiety, disengagement, aggression, and hate. The study, published in Psychological Science, has proven influential, already accruing over 170 citations. But three years later, the preprint authors Nick Brown and James Coyne from the University of Groningen claimed to have identified “numerous conceptual and methodological limitations”. Within the month, Eichstaedt and his colleagues issued a riposte, publishing their own preprint that claims further evidence to support their original conclusions.
As recent revelations surrounding Facebook and Cambridge Analytica have highlighted, corporations and political organisations attach a high value to social media data. But, Eichstaedt argues, that same data also offers rich insights into psychological health and well-being. With appropriate ethical oversight, social media analytics could promote population health and perhaps even save lives. That at least is its promise. But with big data come new challenges – as Eichstaedt’s “debate” with Brown and Coyne illustrates.
In early 2015, Nick Brown was discharged from hospital. Getting his teeth into Eichstaedt’s paper was, as he put it, “my way of picking myself up.” In recent years, Brown has developed a reputation for uncovering irregularities in published research. His investigations into the work of Cornell food scientist Brian Wansink have contributed (at the time of writing) to 8 retractions and 15 corrections. But, as Brown stresses, the discussion of Eichstaedt’s paper is very different. His concern is simply that the study has been oversold. “People who might be thinking of using social media for public health need to be aware of the counter arguments,” Brown says.
The central claim in Eichstaedt’s 2015 paper is that Twitter data can provide new information about heart disease that is not available from other measures. The analysis began with 148 million tweets, written between 2009 and 2010 and geo-tagged to US counties based on the tweeter’s user profile. Each tweet was scored according to multiple emotion “variables” by comparing its constituent words against pre-specified lists of words related to that emotion. Information from a subset of counties was then fed into a machine-learning algorithm – a computer program that worked out the combination of Twitter variables providing the closest fit to the heart disease data. When tested on the remaining counties, the Twitter model predicted heart disease rates more accurately than did similar models based on conventional measures such as a county’s racial demographics, the income and education levels of its residents, and rates of smoking and obesity. This, according to the accompanying press release, shows that “tweets are aggregating information about people that can’t be readily accessed in other ways.” Twitter analysis “could be used to marshal evidence of the effectiveness of public-health interventions on the community”.
Brown, however, is sceptical. His preprint lists numerous sources of bias and noise that would make it hard to find a genuine heart disease signal in the Twitter noise. Words can have multiple meanings depending on the context. Twitter data are unrepresentative of the population – and users who tag their locations may not represent the wider Twitter user base. In counties with small populations, the data may be dominated by a small number of prolific users. And tweets with certain racial terms – often used in a hostile fashion – appear to have been censored by Twitter prior to their release to researchers. There is noise too in the health records – autopsies are rarely carried out to establish cause of death and so variation in reported heart disease rates partly reflect variation in reporting practices.
For his part, Eichstaedt acknowledges that the data are noisy and in some ways biased. But, he says, the salient point is that they still found a signal. And their preprint responding to Brown and Coyne includes a new analysis of a larger and more recent sample of Twitter data, which again outperforms a model based on traditional demographic measures.
“Perhaps there is a signal,” Brown acknowledges. “But there’s an awful lot of wrongness.” And, he adds, the fact that Twitter predicts heart disease better than other measures is “in some way irrelevant” because the best predictor of a community’s future heart disease rate is its current heart disease rate. This, Eichstaedt says, is “trivially true” for most outcomes with complex underlying causes. But prediction in itself was not the goal. “We are trying to use Twitter as a source of epidemiological insight,” he says. The first step is to show that there is a signal. The next step is to work out what it means.
When his paper was published in 2015, Eichstaedt’s findings made headlines around the world. “Angry Tweeting could increase your risk of heart disease” said the UK’s Telegraph. CBS News led with “Cheerful tweets may mean a healthier heart”. The association between Twitter language and heart disease has an intuitive appeal. As Brown notes, it speaks to the notion of a “Type A” personality – permanently uptight, angry at the world, and (according to controversial research) more prone to heart disease. But in their paper, Eichstaedt and colleagues were clear. “Obviously the people tweeting are not the people dying,” they wrote. Instead, they suggested, the sentiments expressed in tweets may reflect characteristics of the communities to which tweeters and heart disease sufferers belong – their “shared economic, physical, and psychological environment”.
Brown describes this as “pure speculation”. But it’s a testable idea. If Twitter language really does measure the psychological profile of a community then it should predict mental health even better than it predicts physical health. Adapting Eichstaedt’s methods, Brown and Coyne used Twitter language to try and predict deaths from suicide instead of heart disease. They found significant correlations but in the “wrong” direction. Counties with higher rates of suicide had relatively fewer tweets containing anger, negative emotions and negative relationships and more tweets about nature, romantic love, and positive social relationships.
Eichstaedt, however, is unperturbed by this finding. “Suicide is a weird variable,” he says. It’s much rarer than heart disease or cancer, for example, and rates are typically much higher in rural communities, perhaps due to social isolation and higher rates of gun ownership. In their response to Brown and Coyne, Eichstaedt and his team report that a county’s suicide rate correlates with its altitude above sea level and the percentage of people living in rural areas. When they controlled for these two measures, the correlations between suicide and Twitter language disappeared.
A better direct measure of community mental health, Eichstaedt argues, is a survey report of “mentally unhealthy days”, which he and his colleagues have found to correlate with Twitter language in the expected direction. In fact, Eichstaedt notes, the pattern is very similar to that for heart disease. The question then is whether Twitter data can distinguish between heart disease and other conditions. Or does it reflect some underlying factor that relates to most diseases? “The jury is still out on this,” Eichstaedt admits. “We haven’t cracked that nut.”
For psychology researchers, often criticised for their small sample sizes and the questionable relevance of their studies to everyday life, the appeal of social media data is obvious. But the price of big data is a loss of experimental control and the ability to tease apart cause and effect. There’s also a loss of transparency. The complex analytic procedures represent a serious challenge for psychologists attempting to understand and evaluate a study. “It becomes,” says Brown, “a question of do I trust your software?”
Eichstaedt, a physicist by training, is sympathetic to this view. “When you get super fancy and the reviewers can’t follow you any more and have to take you on faith, that’s problematic,” he says. “It’s hacking science.” The solution, Brown argues, may be to move away from traditional journals and peer review. “It has to become some sort of ongoing process,” he says. In a “better world” Eichstaedt’s paper would itself have been posted as a preprint and Brown’s critique could have been part of the post-publication review process. This scenario is similar to the approach adopted in physics where preprints and responses have long been the mainstay of scientific communication.
Yet it’s clear that a similar culture shift in psychology will come with its own teething troubles and questions of etiquette. In Brown’s view, the original Eichstaedt paper was badged as “open science” so there should have been no need to seek further information from the authors. He decided not to contact them as a matter of principle. Eichstaedt, however, notes that a more collaborative approach would have allowed him to quickly address some misconceptions about the research and direct Brown and Coyne to analysis code that wasn’t available at the time of publication. “I think that would have been a much less painful way of getting to the same point”. Still, the exchange has provided an opportunity for his team to clarify some issues and make it easier for anyone in the field to replicate and adapt their analyses. “Probably an older wiser version of me will only see that positive side,” he says.