By Emma Young
Of all “cross-modal” findings, the most famous is surely the bouba-kiki effect — that we tend to pair round, blobby shapes with the sound bouba and spiky shapes with kiki. However, research has not yet revealed why this effect is common among adults who speak very different languages — and even in infants as young as four.
Various theories have been put forward. One holds that levels of emotional arousal may be key — that both kiki and a spiky shape trigger relatively high levels of arousal, compared with bouba and a blob. Now a new study, reported in Psychological Science, provides compelling evidence for this idea. The researchers also take their findings further, arguing that they could have important implications for understanding the early evolution of languages.
For an initial study, Arash Aryani at Freie Universität Berlin and colleagues presented English-speaking students (at Cornell University, US) with shapes and words taken from previous studies that have investigated the bouba-kiki effect. (For example, maluma vs takete, from the original 1929 report of this word/shape effect). In each case, the participants had to use a five-point scale to rate how calming or exciting they considered these words or shapes to be. The results revealed significantly higher arousal ratings for kiki-type than bouba-type words and shapes.
Next, the team asked a fresh group of participants to use the same scale to rate 940 computer-generated pseudo-words (which were all spoken in a neutral tone by an actor) for how arousing they sounded. These pseudo-words all sounded like they could be English words, but they had no meaning (think tylo or sooking, for example). Separately, the team analysed them, looking for particular acoustic features that earlier work has suggested we use to infer different emotional states. (For example, variations in the fundamental frequency of even animal sounds lead us to infer different emotions, and quickly-pulsating sounds are associated with high arousal.)
The results showed a clear relationship — those pseudo-words that the researchers expected, based on their acoustic analysis, to trigger more arousal received higher arousal ratings from the participants. The same was true for low arousal.
The team then applied the acoustic model to the specific bouba-like and kiki-like words that they’d used in the first experiment. They found that the model predicted significantly higher arousal values for the kiki-like words than for the bouba-like words. This suggested, then, that the acoustic features of bouba and kiki (and similar pairs) explain the low vs high arousal ratings given by the participants in the first study.
Together, the results suggest that, no matter what the word (or pseudo-word), depending on its acoustic features, it will trigger different perceptions of arousal. “Any wordlike stimulus can potentially convey emotional information solely on the basis of its perceptual acoustic characteristics, making it possible to match it with emotionally similar concepts, eg. shapes,” the team writes.
Based on the results of the second experiment, the researchers then identified groups of “high-arousal”, “medium-arousal”, and “low-arousal” pseudo-words. They asked a fresh batch of participants to match these to a range of spiky and rounded shapes (again taken from previous studies). As the team predicted, spiky shapes were most likely to be matched to high-arousal pseudo-words (followed by medium-arousal and then finally low-arousal examples), while rounded shapes were most likely to be matched to the low arousal group. “These results suggest that the extent to which a word in a bouba-kiki experiment is matched with a rounded shape or spiky shape depends on the level of arousal elicited by its sound,” the team writes.
The findings not only support the emotional arousal theory to explain the bouba-kiki effect but may also shed light on the early stages of language evolution, they add.
The finding that humans can reliably identify levels of arousal in the vocal sounds of a range of animals is telling, they think. “An initial form of vocalisation might have been a motor reflection of arousal in the vocal tract. Such vocalisations might gradually have been used to refer to external objects that were associated with similar affective experience.” So a sharp rock could become associated with high-arousal sounds (and note the “sharp” sound of rock vs bubble, say.)
If our early speech sounds and words were grounded in our emotional, arousal system, this could have made them easier to learn, and so more likely to be retained, the team thinks. And though most words are often held up to be arbitrary, there’s increasing evidence that this is often not the case — our word sound/meaning links go deeper than the onomatopoeia of crack for instance, or the ideogram of zigzag, and can, it seems, be grounded in our emotional responses.