Journals, especially high-impact journals, are notorious for not being interested in publishing replication studies, especially those that fail to obtain an interesting result. A recent paper by Lorna Halliday emphasises just how important replications are, especially when they involve interventions that promise to help children's development.
Halliday focused on a commercially-available program called Phonomena, which was developed with the aim of improving children's ability to distinguish between speech sounds – a skill which is thought to be important for learning to read, as well as for those who were learning English as a second language. An initial study reported by Moore et al in 2005 gave promising results. A group of 18 children who were trained for 6 hours using Phonomena showed improvements on tests of phonological awareness from pre- to post-training, whereas 12 untrained children did not.
In a subsequent study, however, Halliday and colleagues failed to replicate the positive results of Moore's group using similar methods and stimuli. Although children showed some learning of the contrasts they were trained on, this did not generalise to tests of phonology or language administered before and after the training session. Rather than just leaving us with this disappointing result, however, Halliday decided to investigate possible reasons for the lack of replication, and her analysis should be required reading for anyone contemplating an intervention study, revealing, as it does, a number of apparently trivial factors that appear to play a role in determining results.
The different results could not be easily accounted for by differences in the samples of children, who were closely similar in terms of their pre-training scores. In terms of statistical power, the Halliday sample was larger, so should have had a better chance of detecting a true effect if it existed. There were some procedural differences in the training methods used in the two studies, but this led to better learning of the trained contrasts in the Halliday study, so we might have expected more transfer to the phonological awareness tests, when in fact the opposite was the case.
Halliday notes a number of factors that did differ between studies and which may have been key. First, the Halliday study used random assignment of children to training groups, whereas the Moore study gave training to one tutor group and used the other as a control group. This is potentially problematic because children will have been exposed to different teaching during the training interval. Second, both the experimenter and the children themselves were aware of which group was which in the Moore study. In the Halliday study, in contrast, two additional control groups were used who also underwent training. This avoids the problems of 'placebo' effects that can occur if children are motivated by the experience of training, or if they improve because of greater familiarity with the experimenter. Ideally, in a study like this, the experimenter should be blind to the child's group status. This was not the case for either of the studies, leaving them open to possible experimenter bias, but Halliday noted that in her study the experimenter did not know the child's pre-test score, whereas in the Moore study, the experimenter was aware of this information.
Drilling down to the raw data, Halliday noted an important contrast between the two studies. In the Moore study, the untreated control group showed little gain on the outcome measures of phonological processing, whereas in her study they showed significant improvements on two of the three measures. It's feasible that this might have been because of the fact that the controls in the Halliday study were rewarded for participation, had regular contact with the experimenters throughout the study, and were tested at the end of the study by someone who was blind to their prior test score.
There has been much debate about the feasibility and appropriateness of using Randomised Controlled Trial methodology in educational settings. Such studies are hard to do, but their stringent methods have evolved for very good reasons: unless we carefully control all aspects of a study, it is easy to kid ourselves that an intervention has a beneficial effect, when in fact, a control group given similar treatment but without the key intervention component may do just as well.
Halliday LF (2014). A Tale of Two Studies on Auditory Training in Children: A Response to the Claim that 'Discrimination Training of Phonemic Contrasts Enhances Phonological Processing in Mainstream School Children' by Moore, Rosenberg and Coleman (2005). Dyslexia (Chichester, England) PMID: 24470350
Post written for the BPS Research Digest by guest host Dorothy Bishop, Professor of Developmental Neuropsychology and a Wellcome Principal Research Fellow at the Department of Experimental Psychology in Oxford, Adjunct Professor at The University of Western Australia, Perth, and a runner up in the 2012 UK Science Blogging Prize for BishopBlog.