Category: Methodological

How do you prove that reading boosts IQ?

By guest blogger Stuart Ritchie.

A recent study on whether reading boosts intelligence attracted global media attention: “Reading at a young age makes you smarter,” announced the Daily Mail. “Early reading boosts health and intelligence,” said The Australian.

In the race for eye-catching headlines, this mainstream media coverage arguably missed the more fascinating story of the hunt for cause and effect. Here lead author Dr Stuart Ritchie explains the science:

“Causality, it turns out, is really difficult to prove. Correlational studies, while interesting, don’t give us information about causation one way or another. The randomised controlled trial is the ‘gold standard’ method of telling whether a manipulation has an effect on an outcome. But what if a randomised experiment isn’t possible, for practical or ethical reasons? Thankfully, there is an entire toolkit of study designs that go beyond correlation, and can be used to take steps up the ladder closer to causation.

Say you wanted to find interventions that cause intelligence to increase. Since childhood intelligence test scores are so powerfully predictive of later educational success, as well as health and wealth, it’s of great importance to find out how they might be improved. All sorts of nutritional supplements and training programmes have been tried, but all have failed (so far) to reliably show benefits for IQ. However, one factor that has been convincingly shown to cause improvements in intelligence test scores is education. It wouldn’t exactly be ethical to remove some children from school at random and see how they do in comparison to their educated peers. But in a step up the aforementioned causal ladder, researchers in 2012 used a ‘natural experiment’ in the Norwegian education system (where compulsory years of education were increased in some areas but not others) to show that each year’s worth of extra education added 3.6 IQ points.

What is it about education that’s driving these effects? Could it be that a very basic process like learning to read is causing the improvements in IQ? Keith Stanovich and colleagues showed, in a number of studies in the 1990s, that earlier levels of reading interest (though not ability) were predictive of later levels of verbal intelligence, even after controlling for children’s initial verbal intelligence. In a 1998 review, they concluded that “reading will make [children] smarter”.

On the ladder of causation, a control for pre-existing ability in a non-experimental design is important, but problems remain. For instance, since we know that common genes contribute to reading and intelligence, any study that fails to measure or control for genetic influences can’t rule out that the possibility that the early reading advantage and the later intelligence benefit are due simply to a shared genetic basis that is, say, expressed at different times in different areas of the brain. If only there were a way of cloning children – comparing one “baseline” version of each child against a second version with improved reading ability, and then seeing if the better reading translated to higher intelligence later in development…

This sounds like a far-fetched fantasy experiment. But in a recent study, my colleagues and I did just that, though we left it to nature to do the cloning. Tim Bates, Robert Plomin, and I analysed data from 1,890 pairs of identical twins who were part of the Twins Early Development Study (TEDS). The twins had their reading ability and intelligence tested on multiple measures (averaged into a composite) at ages 7, 9, 10, 12, and 16. For each twin pair at each age, we calculated the difference between one twin and the other on both variables. Since each pair was near-100 per cent identical genetically, and was brought up in the same family, these differences must have been caused purely by the ‘non-shared environment’ (that is, environmental influences experienced by one twin but not the other).

We found that twins who had an advantage over their co-twin on reading at earlier points in their development had higher intelligence test scores later on. Because this analysis controls for initial IQ differences, as well as genetics and socioeconomic circumstances, it is considerably more compelling than previous results that used less well-controlled designs. It’s important to note that we found associations between earlier reading ability and later nonverbal intelligence, as well as later verbal intelligence. So, beyond the not-particularly-surprising finding that being better at reading might help with a child’s vocabulary, we made the pretty-surprising finding that it might also help with a child’s problem solving and reasoning ability. Why?

We now enter the realm of speculation. It might be that reading allows children to practise the skills of assimilating information and abstract thought that are useful when completing IQ tests. The process of training in reading may also help teach children to concentrate on tasks—like IQ tests—that they’re asked to complete. Our research doesn’t shed light on these mechanisms, but we hope future studies will.

One should not give our study a criticism-free ride just because it tells a cheery, ‘good news’ story. A step up toward causation is not causation. Could there have been alternative explanations for our findings? Certainly. It is possible that, for instance, teachers spot a child with a reading advantage and give them additional attention, raising their intelligence ‘without’, as we say in the paper, ‘reading doing the causal “work”‘. It may also have been that our controls were inadequate – as I said above, identical twins are nearly genetically identical, but a small number of unique genetic mutations might occur within each pair. The largest lacuna in our study, though, was the cause of the initial within-pair reading differences. Whether these were caused by teaching, peers, pure luck, or some other process, we couldn’t tell, and it’s of great interest to find out.

We hope that our study encourages researchers in three ways. First, in the eternal quest for intelligence-boosters, instead of looking to flashy new brain-training games or the like, they might wish to examine, and maximise, the potentially IQ-improving effects of ‘everyday’ education. Second, they could attempt to answer the questions raised by our study. Why do identical twins differ in reading, and are the reasons under a teacher’s control? What are the specific mechanisms that might lead from literacy to intelligence? Third, and more generally, we hope it will inspire them to consider new methods, including the twin-differences design, that edge further up the causal ladder, away from the basic correlational study. The data are, of course, far harder to collect, but the stronger inferences found there are well worth the climb.”


Ritchie, S., Bates, T., & Plomin, R. (2014). Does Learning to Read Improve Intelligence? A Longitudinal Multivariate Analysis in Identical Twins From Age 7 to 16 Child Development DOI: 10.1111/cdev.12272

Post written by Stuart J. Ritchie, a Research Fellow in the Centre for Cognitive Ageing and Cognitive Epidemiology at the University of Edinburgh. Follow him on Twitter: @StuartJRitchie

The mistakes that lead therapists to infer psychotherapy was effective, when it wasn’t

How well can psychotherapists and their clients judge from personal experience whether therapy has been effective? Not well at all, according to a paper by Scott Lilienfeld and his colleagues. The fear is that this can lead to the continued practice of ineffective, or even harmful, treatments.

The authors point out that, like the rest of us, clinicians are subject to four main biases that skew their ability to infer the effectiveness of their psychotherapeutic treatments. This includes the mistaken belief that we see the world precisely as it is (naive realism), and our tendency to pursue evidence that backs our initial beliefs (the confirmation bias). The other two are illusory control and illusory correlations – thinking we have more control over events than we do, and assuming the factors we’re focused on are causally responsible for observed changes.

These features of human thought lead to several specific mistakes that psychotherapists and others commit when they make claims about the effectiveness of psychological therapies. Lilienfeld’s team call these mistakes “causes of spurious therapeutic effectiveness” or CSTEs for short. The authors have created a taxonomy of 26 CSTEs arranged into three categories.

The first category includes 15 mistakes that lead to the perception that a client has improved, when in fact he or she has not. These include palliative benefits (when the client feels better about their symptoms without actually showing any tangible improvement); confusing insight with improvement (when the client better understands their problems, but does not actually show recovery); and the therapist’s office error (confusing a client’s presentation in-session with their behaviour in everyday life).

The second category consists of errors that lead therapists and their clients to infer that symptom improvements were due to the therapy, and not some other factor, such as natural recovery that would have occurred anyway. Among these eight mistakes are a failure to recognise that many disorders are cyclical (periods of recovery interspersed with phases of more intense symptoms); ignoring the influence of events occurring outside of therapy, such as an improved relationship or job situation; and the influence of maturation (disorders seen in children and teens can fade as they develop).

The third and final category of errors are those that lead to the assumption that improvements are caused by unique features of a therapy, rather than factors that are common to all therapies. Examples here include not recognising placebo effects (improvements stemming from expectations) and novelty effects (improvements due to initial enthusiasm).

To counter the many CSTEs, Lilienfeld’s group argue we need to deploy research methods including using well-validated outcome measures, taking pre-treatment measures, blinding observers to treatment condition, conducting repeated measurements (thus reducing the biasing impact of irregular everyday life events), and using control groups that are subjected to therapeutic effects common to all therapies, but not those unique to the treatment approach under scrutiny.

“CSTEs underscore the pressing need to inculcate humility in clinicians, researchers, and students,” conclude Lilienfeld and his colleagues. “We are all prone to neglecting CSTEs, not because of a lack of intelligence but because of inherent limitations in human information processing. As a consequence, all mental health professionals and consumers should be sceptical of confident proclamations of treatment breakthroughs in the absence of rigorous outcome data.”


Lilienfeld, S., Ritschel, L., Lynn, S., Cautin, R., & Latzman, R. (2014). Why Ineffective Psychotherapies Appear to Work: A Taxonomy of Causes of Spurious Therapeutic Effectiveness Perspectives on Psychological Science, 9 (4), 355-387 DOI: 10.1177/1745691614535216

–further reading–
When therapy causes harm

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

What the textbooks don’t tell you – one of psychology’s most famous experiments was seriously flawed

Zimbardo speaking in ’09

By Christian Jarrett

Conducted in 1971, the Stanford Prison Experiment (SPE) has acquired a mythical status and provided the inspiration for at least two feature-length films. You’ll recall that several university students allocated to the role of jailor turned brutal and the study had to be aborted prematurely. Philip Zimbardo, the experiment’s lead investigator, says the lesson from the research is that in certain situations, good people readily turn bad. “If you put good apples into a bad situation, you’ll get bad apples,” he has written.

The SPE was criticised back in the 70s, but that criticism has noticeably escalated and widened in recent years. New details to emerge show that Zimbardo played a key role in encouraging his “guards” to behave in tyrannical fashion. Critics have pointed out that only one third of guards behaved sadistically (this argues against the overwhelming power of the situation). Question marks have also been raised about the self-selection of particular personality types into the study. Moreover, in 2002, the social psychologists Steve Reicher and Alex Haslam conducted the BBC Prison Study to test the conventional interpretation of the SPE. The researchers deliberately avoided directing their participants as Zimbardo had his, and this time it was the prisoners who initially formed a strong group identity and overthrew the guards.

Given that the SPE has been used to explain modern-day atrocities, such as at Abu Ghraib, and given that nearly two million students are enrolled in introductory psychology courses in the US, Richard Griggs, professor emeritus at the University of Florida, says “it is especially important that coverage of it in our texts be accurate.”

So, have the important criticisms and reinterpretations of the SPE been documented by key introductory psychology textbooks? Griggs analysed the content of 13 leading US introductory psychology textbooks, all of which have been revised in recent years, including:  Discovering Psychology (Cacioppo and Freberg, 2012); Psychological Science (Gazzaniga et al, 2012); and Psychology (Schacter et al, 2011).

Of the 13 analysed texts, 11 dealt with the Stanford Prison Experiment, providing between one to seven paragraphs of coverage. Nine included photographic support for the coverage. Five provided no criticism of the SPE at all. The other six provided only cursory criticism, mostly focused on the questionable ethics of the study. Only two texts mentioned the BBC Prison Study. Only one text provided a formal scholarly reference to a critique of the SPE.

Why do the principal psychology introductory textbooks, at least in the US, largely ignore the wide range of important criticisms of the SPE? Griggs didn’t approach the authors of the texts so he can’t know for sure. He thinks it unlikely that ignorance is the answer. Perhaps the authors are persuaded by Zimbardo’s answers to his critics, says Griggs, but even so, the criticisms should be mentioned and referenced. Another possibility is that textbook authors are under pressure to shorten their texts, but surely they are also under pressure to keep them up-to-date.

It would be interesting to compare coverage of the SPE in European introductory texts. Certainly there are contemporary books by British psychologists that do provide more in-depth critical coverage of the SPE.

Griggs’ advice for textbook authors is to position coverage of the SPE in the research methods chapter (instead of under social psychology), and to use the experiment’s flaws as a way to introduce students to key issues such as ecological validity, ethics, demand characteristics and subsequent conflicting results. “In sum,” he writes, “the SPE and its criticisms comprise a solid thread to weave numerous research concepts together into a good ‘story’ that would not only enhance student learning but also lead students to engage in critical thinking about the research process and all of the possible pitfalls along the way.”


Griggs, R. (2014). Coverage of the Stanford Prison Experiment in Introductory Psychology Textbooks Teaching of Psychology, 41 (3), 195-203 DOI: 10.1177/0098628314537968

further reading
Foundations of sand? The lure of academic myths and their place in classic psychology
Tyranny and The Tyrant,  From Stanford to Abu Ghraib (pdf; Phil Banyard reviews Zimbardo’s book The Lucifer Effect).

Image credit: Jdec/Wikipedia

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest

How your mood changes your personality

Participants scored higher on neuroticism & lower on extraversion when they were sad

Except in extreme cases of illness or trauma, we usually expect each other’s personalities to remain stable through life. Indeed, central to the definition of personality is that it describes pervasive tendencies in a person’s behaviour and ways of relating to the world. However, a new study highlights the reality – your personality is swayed by your current mood, especially when you’re feeling down.

Jan Querengässer and Sebastian Schindler twice measured the personality of 98 participants (average age 22; 67 per cent female), with a month between each assessment. Before one of the assessments, the participants either watched a ten-minute video designed to make them feel sad, or to make them feel happy. The sad clip was from the film Philadelphia and Barber’s Adagio for Strings was also added into the mix. The happy video showed families reunited after the fall of the Berlin Wall, together with Mozart’s Eine klieine Nachtmusik. Before their other personality assessment, the participants watched a neutral video about people with extreme skills.

When participants answered questions about their personality in a sad state, they scored “considerably” higher on trait neuroticism, and “moderately” lower on extraversion and agreeableness, as compared with when they completed the questionnaire in a neutral mood state. There was also a trend for participants to score higher on extraversion when in a happy mood, but this didn’t reach statistical significance. The weaker effect of happy mood on personality may be because people’s supposed baseline mood (after the neutral video) was already happy. Alternatively, perhaps sad mood really does have a stronger effect on personality scores than happiness. This would make sense from a survival perspective, the researchers said, because sadness is usually seen as a state to be avoided, while happiness is a state to be maintained. “Change is more urgent than maintenance,” they explained.

These results complement previous research suggesting that a person’s personality traits are associated with more frequent experience of particular emotions. For example, there’s evidence that high scorers on extraversion experience more happiness than lower scorers. However, the new data highlight how the relationship can work both ways – with current emotional state also influencing personality (or the measurement of personality, at least). We are familiar with this in our everyday lives – even our most vivacious friends can seem less friendly and sociable when they’re down. With strangers though, it’s easy to forget these effects and assume that their behaviour derives from fixed personality rather than temporary mood.

Although this research appears to challenge the notion of personality as fixed, the results, if heeded, could actually help us drill down to a person’s underlying long-term traits. As Querengässer and Schindler explained, “becoming aware of participants’ emotional state and paying attention to the possible implications on testing could lead to a notable increase in the stability of assessed personality traits.”

Querengässer, J., & Schindler, S. (2014). Sad but true? – How induced emotional states differentially bias self-rated Big Five personality traits BMC Psychology, 2 (1) DOI: 10.1186/2050-7283-2-14

–further reading–
Why are extraverts happier?
Situations shape personality, just as personality shapes situations

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

Facebook mood manipulation study – the outcry and counter reaction (link feast special)

There’s been an outcry after Facebook manipulated the news feeds of nearly 700,000 its users, as part of a newly published investigation into online emotional contagion. Here we bring you a handy round-up of some of the ensuing commentary and reaction.

The Furore
The Psychologist magazine brings us up to speed with the main findings and fallout from the affair (Wired’s detailed coverage is also good).

The “Apology
Lead author on the paper, in-house Facebook researcher Adam Kramer took to Facebook on June 29 to apologise. “…our goal was never to upset anyone,” he writes. Kramer’s co-authors were researchers at Cornell University.

The Statement
Cornell University claim that their researchers analysed data collected by Facebook, and had no part in data collection themselves. “Cornell University’s Institutional Review Board concluded that he [co-author Professor Jeffrey Hancock] was not directly engaged in human research and that no review by the Cornell Human Research Protection Program was required.

The Evasion
Princeton University psychology Susan Fiske – editor at PNAS where the research was published – told Guardian Blogger Chris Chambers that she didn’t have time to answer all his questions about the study. Retorts Chambers: “In what version of 2014 is it acceptable for journals, universities and scientists to offer weasel words and obfuscation in response to simple questions about research ethics?”

The Bad Research Methods
John Grohol at World of Psychology pointed out that the text analysis used in the research was flawed. For example, “I am not having a great day” and “I am not happy” would be rated positive because of the presence of the words “great” and “happy”.

The Outcry
Over at Znet, tech blogger Steven J. Vaughan-Nichols said he knew all Facebook users were guinea pigs, but not that they were lab rats. “Stop it, Facebook. Stop it now. And, never, ever do anything like this again.”

The Outcry II
NPR blogger Linda Holmes: “I speak here as a Facebook user and straight from the heart: It’s gross. It’s gross.”

The Counter Reaction
“Facebook users more or less get what they should expect — and what they deserve, given that they use Facebook’s service for free”. This the opinion of California Polytechnic ethicist Patrick Lin, as paraphrased by the Wall Street Journal.

The Counter Reaction II
Calm down, says Alice Park at Time, “what Facebook did was scientifically acceptable, ethically allowable and, let’s face it, probably among the more innocuous ways that you’re being manipulated in nearly every aspect of your life.” (similar sentiments from Forbes technology writer & Tal Yarkoni at New Scientist).

The Solution?
“Until Facebook changes its practices,” says Selena Larson at “there’s only one way to assuredly remove yourself as a candidate for a scientific experiment: Delete your Facebook account.”

Update 2 July:

Update 4 July:


Post compiled by Christian Jarrett (@psych_writer) for the BPS Research Digest.

How burnt-out students could be skewing psychology research

It’s well known that psychology research relies too heavily on student volunteers. So many findings are assumed to apply to people in general, when they could be a quirk unique to undergrads. Now Michael Nicholls and his colleagues have drawn attention to another problem with relying on student participants – those who volunteer late in their university term or semester lack motivation and tend to perform worse than those who volunteer early.

A little background about student research participants. Psychology students often volunteer for numerous studies throughout a semester. Usually, they’re compelled to do this at least once in return for course credits that count towards their degree. Other times they receive cash or other forms of compensation. When in the semester they opt to volunteer for course credit is usually down to their discretion. To over-generalise, conscientious students tend to volunteer early in semester, whereas less disciplined students leave it until last minute, when time is short and deadlines are pressing.

Nicholls team first recruited 40 students participants (18 men) at Flinders University during the third week of a 14-week semester.  Half of them were first years who’d chosen to volunteer early in return for course credits. The other half of the participants, who hailed from various year groups, had chosen the option to receive $10 compensation. The challenge for both groups of students was the same – to perform 360 trials of a sustained attention task. Each trial they had to press a button as fast as possible if they saw any number between 1 and 9, except for the number 3, in which case they were to withhold responding.

At this early stage of the semester there was no difference in the performance (based on speed and accuracy) of the students who volunteered for course credit or for money. There was also no difference in their motivation levels, as revealed in a questionnaire.

Later in the semester, between weeks 9 to 12, the researchers repeated the exercise, with 20 more students who’d enrolled for course credit and 20 more who’d applied to participate in return for cash compensation. Now the researchers found a difference between the groups. Those participants receiving financial payment outperformed those who had volunteered in return for course credit. The latter group also showed more variability in their performance than their course-credit counterparts had done at the start of the semester, and they reported having lower motivation.

These results suggest that students who wait to volunteer for course credit until late in the semester lack motivation and their performance suffers as a result. Nicholls and his colleagues explained that their findings have serious implications for experimental design. “A lack of motivation and/or poorer performance may introduce noise into the data and obscure effects that may have been significant otherwise. Such effects become particularly problematic when experiments are conducted at different times of semester and the results are compared.”

One possible solution for researchers planning to compare findings across experiments conducted at different ends of a semester, is to ensure that they only test paid participants. Unlike participants who are volunteering for course credit, those who are paid seem to have consistent performance and motivation across the semester.


Nicholls, M., Loveless, K., Thomas, N., Loetscher, T., & Churches, O. (2014). Some participants may be better than others: Sustained attention and motivation are higher early in semester The Quarterly Journal of Experimental Psychology, 1-19 DOI: 10.1080/17470218.2014.925481

further reading
The use and abuse of student participants
Improving the student participant experience

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

A replication tour de force

In his famous 1974 lecture, Cargo Cult Science, Richard Feynman recalls his experience of suggesting to a psychology student that she should try to repeat a previous experiment before attempting a novel one:

“She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happened.”

Despite the popularity of the lecture, few took his comments about lack of replication in psychology seriously – and least of all psychologists. Another 40 years would pass before psychologists turned a critical eye on just how often they bother to replicate each other’s experiments. In 2012, US psychologist Matthew Makel and colleagues surveyed the top 100 psychology journals since 1900 and estimated that for every 1000 papers published, just two sought to closely replicate a previous study. Feynman’s instincts, it seems, were spot on.

Now, after decades of the status quo, psychology is finally coming to terms with the idea that replication is a vital ingredient in the recipe of discovery. The latest issue of the journal Social Psychology reports an impressive 15 papers that attempted to replicate influential findings related to personality and social cognition. Are men really more distressed by infidelity than women? Does pleasant music influence consumer choice? Is there an automatic link between cleanliness and moral judgements?

Many supposedly ‘classic’ effects could not be found

Several phenomena replicated successfully. An influential finding by Stanley Schacter from 1951 on ‘deviation rejection’ was successfully repeated by Eric Wesselman and colleagues. Schacter had originally found that individuals whose opinions persistently deviate from a group norm tend to be disempowered by the group and socially isolated. Wesselman replicated the result, though finding that it was smaller than originally supposed.

On the other hand, many supposedly ‘classic’ effects could not be found. For instance, there appears to be no evidence that making people feel physically warm promotes social warmth, that asking people to recall immoral behaviour makes the environment seem darker, or for the Romeo and Juliet effect.

The flagship of the special issue is the Many Labs project, a remarkable effort in which 50 psychologists located in 36 labs worldwide collaborated to replicate 13 key findings, across a sample of more than 6000 participants. Ten of the effects replicated successfully.

Adding further credibility to this enterprise, each of the studies reported in the special issue was pre-registered and peer reviewed before the authors collected data. Study pre-registration ensures that researchers adhere to the scientific method and is rapidly emerging as a vital tool for increasing the credibility and reliability of psychological science.

The entire issue is open access and well worth a read. I think Feynman would be glad to see psychology leaving the cargo cult behind and, for that, psychology can be proud too.

– Further reading: A special issue of The Psychologist on issues surrounding replication in psychology.


Klein, R., Ratliff, K., Vianello, M., Adams, Jr., R., Bahník, Bernstein, M., Bocian, K., Brandt, M., Brooks, B., Brumbaugh, C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E., Hasselman, F., Hicks, J., Hovermale, J., Hunt, S., Huntsinger, J., IJzerman, H., John, M., Joy-Gaba, J., Kappes, H., Krueger, L., Kurtz, J., Levitan, C., Mallett, R., Morris, W., Nelson, A., Nier, J., Packard, G., Pilati, R., Rutchick, A., Schmidt, K., Skorinko, J., Smith, R., Steiner, T., Storbeck, J., Van Swol, L., Thompson, D., van ’t Veer, A., Vaughn, L., Vranka, M., Wichman, A., Woodzicka, J., & Nosek, B. (2014). Data from Investigating Variation in Replicability: A “Many Labs” Replication Project Journal of Open Psychology Data, 2 (1) DOI: 10.5334/

Post written for the BPS Research Digest by guest host Chris Chambers, senior research fellow in cognitive neuroscience at the School of Psychology, Cardiff University, and contributor to the Guardian psychology blog, Headquarters.

Antidepressant brain stimulation: Promising signs or continuing doubts?

Depression is a growing public health concern, affecting 1 in 9 people at some point in their lives, and with a third of sufferers experiencing little or no benefit from medication. The World Health Organization predicts that by 2020 depression will become the second leading cause of disability worldwide. By 2026 it is expected to afflict nearly 1.5 million people in the UK, costing the economy more than £12bn every year.

Faced with this crisis, scientists have looked for alternative solutions to medication. Since the mid 1990s there has been a steady interest in developing brain stimulation methods as antidepressants, particularly for patients who are resistant to drug therapy. The general logic of this approach is that because depression is associated with abnormally low activity in the left prefrontal cortex, methods that increase prefrontal activity, such as transcranial magnetic stimulation (TMS), might help promote recovery.

A new Taiwanese study now reports that a particularly potent form of transcranial magnetic stimulation called theta burst stimulation could lead to benefits in treatment-resistant depression. Cheng-Ta Li and colleagues compared the efficacy of three different types of theta burst stimulation: a protocol believed to increase activity in the left prefrontal cortex, one that reduces activity in the right prefrontal cortex, and a combined protocol that seeks to achieve both in the same treatment session. Compared with sham (placebo) stimulation, the team found that two weeks of daily treatment using the combined protocol was most effective, reducing self-ratings of depression by about 35 per cent.

Self-ratings of depression were reduced by about 35 per cent

These results are promising but preliminary. The sample size was small, including just 15 patients per group, and the trial was not preregistered. Such limitations are common in a literature that is dominated by controversy and small exploratory reports. A major 2007 study, which concluded that TMS is clinically effective (and which led to the treatment becoming approved by the FDA) was later criticised for selectively reporting positive outcomes, deviating from its registered analysis protocol, and being contaminated by uncontrolled placebo effects. The most recent review of evidence to date concluded that the benefits of TMS, while measurable statistically, are so small as to be clinically insignificant. And as to how these benefits of TMS arise in the first place – well, the truth is we have almost no idea. Our best guess is ‘because dopamine’.

These uncertainties, in turn, raise concerns about ethics and regulation. With a growing number of companies offering TMS as a private healthcare intervention, and with standard treatments running into thousands of pounds, the fact that its efficacy remains unproven and unexplained is especially pertinent.

Notwithstanding these issues, this latest study by Li and colleagues is a helpful addition to the literature and suggests that more potent neurological interventions, such as theta burst stimulation, have potential. But realising that potential will require a commitment to rigorous and unbiased research practices. We need meticulously preregistered studies to prevent negative findings being censored and to ensure that authors don’t cherry pick analyses that ‘work’ or engage in other questionable research practices. We need studies with larger samples to determine which individual differences determine the efficacy of TMS and to generate reproducible effects. And we need a renewed focus on understanding the neurobiology underlying any benefits of TMS.

Once these challenges are met, brain stimulation may well provide a complementary treatment for depression. For now, though, the jury is out.


Li CT, Chen MH, Juan CH, Huang HH, Chen LF, Hsieh JC, Tu PC, Bai YM, Tsai SJ, Lee YC, & Su TP (2014). Efficacy of prefrontal theta-burst stimulation in refractory depression: a randomized sham-controlled study. Brain : a journal of neurology PMID: 24817188

Post written for the BPS Research Digest by guest host Chris Chambers, senior research fellow in cognitive neuroscience at the School of Psychology, Cardiff University, and contributor to the Guardian psychology blog, Headquarters.

‘I’ll have what she’s having’ – Developing the Faking Orgasm Scale for Women

Over 65 per cent of women are believed to have done it at least once in their lives. Magazines, TV shows and self-help books all talk about it. It features in one of the most memorable movie scenes ever.  What am I talking about? Faking orgasm, of course.

I’ve done it. I wonder if you have too?

Let’s distract ourselves from this potentially awkward moment with a study by Cooper and colleagues, who have created the Faking Orgasm Scale.

When I saw this paper I bristled at the thought of yet another tool to over-diagnose our sexual lives. Really, does it matter if we fake? Doesn’t this surveillance reduce trust and put more pressure on people?

But I liked their discussion of what orgasm might be, why pressure to ‘achieve’ ‘mind-blowing orgasms’ exists in Western culture, and who perpetuates it. (Clue: it’s not just the media, medicalisation of women’s sexual problems by the pharmaceutical industry also doesn’t help).

In a two-stage study respondents (all heterosexual women college students majoring in psychology) were asked when, why and how they faked orgasm. The researchers then narrowed this into four categories:

Altruistic Deceit (e.g. faked orgasm to make a partner happy or prevent them feeling guilty)
Fear and Insecurity (e.g. faked orgasm because they felt ashamed they couldn’t experience orgasm)
Elevated Arousal (e.g. faked an orgasm to get more turned on or increase the intensity of the experience)
Sexual Adjournment (e.g. to end a sexual encounter because of tiredness, a lack of enjoyment etc)

We tend to view faking orgasm as manipulative, whereas this research suggested that it could well play a positive role in increasing arousal. I could see an additional measure of distress being useful here to identify whether the faking was something done pleasurably to enhance sex, or an indication of other sexual or relationships problems where perhaps education or therapy might be of benefit.

A 2008 article in The Psychologist also considered orgasm

Wait! I’m sure you’ve already spotted these participants might be a bit WEIRD [Western, Educated, industrious, Rich, Democratic], so how useful is this study? The authors are up front about their research being limited by the use of a volunteer student sample, and because of this I think the Faking Orgasm Scale may be better described as a tool in development rather than an established measure.

For that to happen the scale would need further research using bi and lesbian women, Trans women, women in long term relationships, and those who are not US psychology majors. It could also broaden into sexual experiences that are not just penis in vagina intercourse or oral sex (the two activities respondents were required to have both tried and faked orgasm during).

The researchers note ‘faking orgasm… seems to have been overlooked almost entirely as a male sexual practice’ – something that future research could certainly benefit from, not least because existing qualitative work indicates faking orgasm is not unique to women and may be equally prevalent in men.

I can see therapists, researchers and healthcare providers welcoming a tool that might encourage us to open up about our sexual experiences. I could also see some practitioners taking issue with a quantified measure of complex behaviour and notions of authenticity and sexual behaviour. Me? I’d welcome anything that might allow us to talk more openly about orgasm so as to resist or reinvent the representations of perfectable sex we’re currently encouraged to aspire to.

– Further reading from The Psychologist – Orgasm.

Cooper EB, Fenigstein A, & Fauber RL (2014). The faking orgasm scale for women: psychometric properties. Archives of sexual behavior, 43 (3), 423-35 PMID: 24346866

Post written for the BPS Research Digest by guest host Petra Boynton, Senior Lecturer in International Primary Care Research, University College London and the Telegraph’s Agony Aunt.

Kidding ourselves on educational interventions?

Journals, especially high-impact journals, are notorious for not being interested in publishing replication studies, especially those that fail to obtain an interesting result. A recent paper by Lorna Halliday emphasises just how important replications are, especially when they involve interventions that promise to help children’s development.

Halliday focused on a commercially-available program called Phonomena, which was developed with the aim of improving children’s ability to distinguish between speech sounds – a skill which is thought to be important for learning to read, as well as for those who were learning English as a second language. An initial study reported by Moore et al in 2005 gave promising results. A group of 18 children who were trained for 6 hours using Phonomena showed improvements on tests of phonological awareness from pre- to post-training, whereas 12 untrained children did not.

In a subsequent study, however, Halliday and colleagues failed to replicate the positive results of Moore’s group using similar methods and stimuli. Although children showed some learning of the contrasts they were trained on, this did not generalise to tests of phonology or language administered before and after the training session. Rather than just leaving us with this disappointing result, however, Halliday decided to investigate possible reasons for the lack of replication, and her analysis should be required reading for anyone contemplating an intervention study, revealing, as it does, a number of apparently trivial factors that appear to play a role in determining results.

The different results could not be easily accounted for by differences in the samples of children, who were closely similar in terms of their pre-training scores. In terms of statistical power, the Halliday sample was larger, so should have had a better chance of detecting a true effect if it existed. There were some procedural differences in the training methods used in the two studies, but this led to better learning of the trained contrasts in the Halliday study, so we might have expected more transfer to the phonological awareness tests, when in fact the opposite was the case.

Halliday notes a number of factors that did differ between studies and which may have been key. First, the Halliday study used random assignment of children to training groups, whereas the Moore study gave training to one tutor group and used the other as a control group. This is potentially problematic because children will have been exposed to different teaching during the training interval. Second, both the experimenter and the children themselves were aware of which group was which in the Moore study. In the Halliday study, in contrast, two additional control groups were used who also underwent training. This avoids the problems of ‘placebo’ effects that can occur if children are motivated by the experience of training, or if they improve because of greater familiarity with the experimenter. Ideally, in a study like this, the experimenter should be blind to the child’s group status. This was not the case for either of the studies, leaving them open to possible experimenter bias, but Halliday noted that in her study the experimenter did not know the child’s pre-test score, whereas in the Moore study, the experimenter was aware of this information.

Drilling down to the raw data, Halliday noted an important contrast between the two studies. In the Moore study, the untreated control group showed little gain on the outcome measures of phonological processing, whereas in her study they showed significant improvements on two of the three measures. It’s feasible that this might have been because of the fact that the controls in the Halliday study were rewarded for participation, had regular contact with the experimenters throughout the study, and were tested at the end of the study by someone who was blind to their prior test score.

There has been much debate about the feasibility and appropriateness of using Randomised Controlled Trial methodology in educational settings. Such studies are hard to do, but their stringent methods have evolved for very good reasons: unless we carefully control all aspects of a study, it is easy to kid ourselves that an intervention has a beneficial effect, when in fact, a control group given similar treatment but without the key intervention component may do just as well.

Halliday LF (2014). A Tale of Two Studies on Auditory Training in Children: A Response to the Claim that ‘Discrimination Training of Phonemic Contrasts Enhances Phonological Processing in Mainstream School Children’ by Moore, Rosenberg and Coleman (2005). Dyslexia (Chichester, England) PMID: 24470350

Post written for the BPS Research Digest by guest host Dorothy Bishop, Professor of Developmental Neuropsychology and a Wellcome Principal Research Fellow at the Department of Experimental Psychology in Oxford, Adjunct Professor at The University of Western Australia, Perth, and a runner up in the 2012 UK Science Blogging Prize for BishopBlog.