By Emma Young
You want to choose a new vacuum cleaner, or book, or hotel, or kids’ toy, or movie to watch — so what do you do? No doubt, you go online and check the star ratings for various options on sites such as Amazon or TripAdvisor, and so benefit from the wisdom of crowds.
However, there are problems with this star-based system, as a new paper in Nature Human Behaviour makes clear. Firstly, most ratings are positive — so how do you choose between two, or potentially many more, products with high ratings, or even the same top rating? Secondly, star ratings aren’t a great predictor of the success (and so actual general appeal and approval) of a movie, book, and so on, note Matthew D. Rocklage at the University of Massachusetts and his colleagues. The team presents an alternative method for picking the best product and also predicting success, which focuses on the emotional responses of the reviewers.
In all four studies reported in the paper, the team used a text analysis tool called the Evaluative Lexicon. This provided measures of the average emotionality and valence (positivity) of a review. Emotionality relates to how much an attitude is rooted in emotion, rather than how positive or negative it is (so reviews that included lots of terms like “awe-inspiring” or “enchanting” got higher emotionality scores than reviews with terms like “impeccable”.)
First, the team looked at the earliest 30 reviews for all movies included on the website metacritic.com from 2005 to 2018. For each movie, they gathered star ratings (from 0 to 10), valence scores and measures of text emotionality.
Overall, 81% of these movies got above average star-ratings. This highlights “the challenge of discerning success and how people will behave in this sea of positive ratings”, which the team calls the “positivity problem”. They also found that star ratings weren’t a good predictor of box office revenue, and text valence wasn’t a helpful predictor, either. Higher emotionality was a positive predictor of future box office takings, however. (This result held when they controlled for a variety of factors, including the genre of the movie, the year it was released, its budget, and so on.)
Next, Rocklage and colleagues used the same approach to try to predict the sales of all books listed on Amazon.com from 1995 to 2015. This time, for some genres, star ratings did predict sales, while for others they didn’t. However, greater emotionality emerged as a predictor of sales across 93 different genres. It was, then, consistently useful.
The researchers then turned to 187,206 real-time tweets posted in response to TV ads for 84 different businesses played during the 2016 and 2017 US Super Bowls. The team found that the greater the emotionality of the tweets about an advert, the more Facebook followers the company gathered over the next two weeks. The equivalent of star ratings for these ads had been gathered by the newspaper USA Today, and these ratings were not predictive of followers.
Finally, the team considered Chicago restaurant reviews on yelp.com and 1.3 million table reservations made on a popular booking website. In contrast to earlier results, high star ratings did predict more table reservations. However, higher emotionality still emerged as a unique predictor of numbers of bookings. As the team writes, “restaurants that elicited more emotion were associated with more table reservations”.
Overall, then, movies, books and restaurants that appeared to evoke more emotion in consumers ended up being more successful. Why might that be the case? Emotions flag memories as being important, and are relatively readily recalled, and attitudes based on emotion tend to be more stable. Clearly this could influence a person’s own behaviour. “Additional work could explore whether attitudes based more on emotion also affect success by increasing individuals’ propensity to spread information via word of mouth,” the team notes.
Overall, the new work does call into question the validity and helpfulness of star ratings. This in itself is not new, but of course the researchers also describe what seems to be a more useful system, which in theory could be broadly adopted. “One possibility is that organizations could consider aggregating reviewers’ language and providing an ‘emotional star rating’ to provide more meaningful assessments to individuals,” they write.