Category: Methodological

Working memory training does not live up to the hype

According to CogMed, one of the larger providers of computerised working memory training, the benefits of such training is “comprehensive” and includes “being able to stay focused, resist distractions, plan activities, complete tasks, and follow and contribute to complex discussions.” Similar claims are made by other providers such as Jungle Memory and Cognifit, which is endorsed by neuroscientist Susan Greenfield.

Working memory describes our ability to hold relevant information in mind for use in mental tasks, while ignoring irrelevant information. If it were possible to improve our working memory capacity and discipline through training, it makes sense that this would have widespread benefits. But that’s a big if.

A new meta-analysis by Monica Melby-Lervåg and Charles Hulme has just been published in the February issue of the respected APA journal Developmental Psychology, which combined the results from 23 studies of working memory training completed up to 2011 (PDF is freely available). To be included, studies had to compare outcomes for a working memory training treatment group against outcomes in a control group. Most of the studies available are on healthy adults or children, with just a few involving children with developmental conditions such as ADHD.

The results were absolutely clear. Working memory training leads to short-term gains on working memory performance on tests that are the same as, or similar to, those used in the training. “However,” Melby-Lervåg and Hulme write, “there is no evidence that working memory training produces generalisable gains to the other skills that have been investigated (verbal ability, word decoding, arithmetic), even when assessments take place immediately after training.”

There was a modest, short-term benefit of the training on non-verbal intelligence but this disappeared when only considering the studies with a robust design (i.e. those that randomised participants across conditions and which enrolled control participants in some kind of activity). Similarly, there was a modest benefit of the training on a test of attentional control, but this disappeared at follow-up.

All of this suggests that working memory training isn’t increasing people’s working memory capacity in such a way that they benefit whenever they engage in any kind of task that leans on working memory. Rather, people who complete the training simply seem to have improved at the specific kinds of exercises used in the training, or possibly even just at computer tasks – effects which, anyway, wear off over time.

Overall, Melby-Lervåg and Hulme note that the studies that have looked at the benefits of working memory training have been poor in design. In particular, they tend not to bother enrolling the control group in any kind of intervention, which means any observed benefits of the working memory training could be related simply to the fun and expectations of being in a training programme, never mind the specifics of what that entails. Related to that, some dubious studies reported far-reaching benefits of the working memory training, without finding any improvements in working memory, thus supporting the notion that these benefits had to do with participant expectations and motivation.

A problem with all meta-analyses, this one included, is that they tend to rely on published studies, which means any unpublished results stuck in a filing cabinet get neglected. But of course, it’s usually negative results that get left in the drawer, so if anything, the current meta-analysis presents an overly rosy view of the benefits of working memory training.

Melby-Lervåg and Hulme’s ultimate conclusion was stark: “there is no evidence that these programmes are suitable as methods of treatment for children with developmental cognitive disorders or as ways of effecting general improvements in adults’ or children’s cognitive skills or scholastic achievements.”


Melby-Lervåg M, and Hulme C (2013). Is working memory training effective? A meta-analytic review. Developmental psychology, 49 (2), 270-91 PMID: 22612437 Free, full PDF of the study.

This meta-analysis only took in reviews published up to 2011. If you know of any quality studies into the effects of working memory training published since that time, please do share the relevant links via comments. 

–Further reading–
Brain training games don’t work.
Brain training for babies actually works (short term, at least)

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

Emotion research gets real – Was this person just told a joke or told they have great hair?

How accurately could you tell from a person’s display of behaviour and emotions what just happened to them?  Dhanya Pillai and her colleagues call this “retrodictive mindreading” and they say it’s a more realistic example of how we perceive emotions in everyday life, as compared with the approach taken by traditional psychological research, in which volunteers name the emotions displayed in static photos of people’s faces.

In Pillai’s study, the task of a group of 35 male and female participants wasn’t to look at pictures and name the facial expression. Instead, the participants watched clips of people reacting to a real-life social scenario and they had to deduce what scenario had led to that emotional display.

Half the challenge Pillai and her colleagues faced was to create the stimuli for this research. They recruited 40 men and women who thought they were going to be doing the usual thing and categorising emotional facial expressions. In fact, it was their own responses that were to become the stimuli for the study proper.

While these volunteers were sitting down ready for the “study” to start, one of four scenarios unfolded. The female researcher either told them a joke (“why did the woman wear a helmet at the dinner table? She was on a crash diet”); told them a story about a series of misfortunes she’d encountered on the way to work; paid them a compliment (e.g. “you’ve got really great hair, what shampoo do you use?”); or made them wait 5 minutes while she had a drink and did some texting. In each case the volunteers’ emotional responses were recorded on film and formed the stimuli for the real experiment.

The researchers ended up with 40 silent clips, lasting 3 to 9 seconds each, comprising ten clips for each of the four scenarios. The real participants for the study proper were first shown footage of the researcher in the four scenarios and how these were categorised as joke, story, compliment or waiting. Then these observer participants watched the 40 clips of the earlier volunteers, and their task in each case was to say which scenario the person in the video was responding to.

The observing participants’ performance was far from perfect – they averaged 60 per cent accuracy – but it was far better than the 25 per cent level you’d expect if they were merely guessing. By far, they were most skilled at recognising when a person was responding to the waiting scenario (90 per cent accuracy). Their accuracy was even for the other scenarios at around 50 per cent. They achieved this success level despite the huge amount of variety in the way the different volunteers responded to the different scenarios. “From observing just a few seconds of a person’s reaction, it appears we can gauge what kind of event might have happened to that individual with considerable success,” the researchers said.

A surprise detail came from the recordings of the observing participants’ eye movements. They focused more on the mouth region rather than the eyes. Based on past research (much of it using static facial displays), Pillai and her colleagues thought that better accuracy would go hand-in-hand with more attention paid to the eye region of the targets’ faces. In fact, for three of the scenarios (all except the joke), the opposite was true. This may be because focusing on the eye region is more beneficial when naming specific mental states, as opposed to the “retrodictive mindreading”challenge involved in the current study.

In contrast to much of the existing psychology literature, Pillia and her team concluded that theirs was an important step towards devising tasks “that closely approximate how we understand other people’s behaviour in real life situations.”


Pillai, D., Sheppard, E., and Mitchell, P. (2012). Can People Guess What Happened to Others from Their Reactions? PLoS ONE, 7 (11) DOI: 10.1371/journal.pone.0049859

Note: the picture above is for illustrative purposes only and was not used in the study.

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

A new test for finding out what people really think of their personality

A problem with your standard personality questionnaire is that most people like to make a good impression. This is especially the case when questionnaires are used for job candidates. One way around this is to use so-called implicit measures of personality, designed to probe subconscious beliefs. The famous Rorschach ink-blot test is one example, but many psychologists criticise it for its unreliability. A more modern example is a version of the implicit association test, in which people are timed using the same response key for self-referential words and various personality traits. If they associate the trait with themselves, they should be quicker to answer. Now a team led by Florin Sava have proposed a brand-new test based on what’s called the “semantic misattribution procedure“.

Nearly a hundred participants watched as personality traits were flashed one at a time for a fifth of a second on a computer screen. After each trait (e.g. “anxious”), a neutral-looking Chinese pictograph was flashed on-screen. The participants didn’t know what these Chinese symbols meant. Their task was to ignore the flashed personality traits and to say whether they’d like each Chinese symbol to be printed on a personalised t-shirt for them or not, to reflect their personality.

This method is based on past research showing that we tend to automatically misattribute the meaning of briefly presented words to subsequent neutral stimuli. So, in the example above, participants would be expected to attribute, at a subconscious level, the meaning of “anxious” to the Chinese symbol. When assessing the suitability of the symbol for their t-shirt, it feels subjectively as if they are merely guessing, or making their judgment based on its visual properties. But in fact their choice of whether the symbol is suitable will be influenced by the anxious meaning they’ve attributed to it, and, crucially, whether or not they have an implicit belief that they are anxious.

In this initial study, and two more involving nearly 300 participants, Sava and his colleagues showed that participants’ scores on this test for conscientiousness, neuroticism and extraversion correlated with explicit measures of the same traits. The new implicit test also did a better job than explicit measures alone of predicting relevant behaviours, such as church attendance, perseverance on a lab task, and punctuality. The implicit scores for extraversion showed good consistency over 6 months. Finally, the new implicit test showed fewer signs of being influenced by social desirability concerns, as compared with traditional explicit measures. Next, the researchers plan to test whether their new implicit measure is immune to attempts at deliberate fakery.

“The present study suggests that the Semantic Misattribution Procedure is an effective alternative for measuring implicit personality self- concept,” the researchers said.


Sava, F., MaricuΤoiu, L., Rusu, S., Macsinga, I., Vîrgă, D., Cheng, C., and Payne, B. (2012). An Inkblot for the Implicit Assessment of Personality: The Semantic Misattribution Procedure. European Journal of Personality, 26 (6), 613-628 DOI: 10.1002/per.1861

–Further reading–
A personality test that can’t be faked

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

Labs worldwide report converging evidence that undermines the low-sugar theory of depleted willpower

One of the main findings in willpower research is that it’s a limited resource. Use self-control up in one situation and you have less left over afterwards – an effect known as “ego-depletion”. This discovery led to a search for the underlying physiological mechanism. In 2007, Roy Baumeister, a pioneer in the field, and his colleagues reported that the physiological correlate of ego-depletion is low glucose. Self-control leads the brain to metabolise more glucose, so the theory goes, and when glucose gets too low, we’re left with less willpower.

The breakthrough 2007 study showed that ego-depleted participants had low blood glucose levels, but those who subsequently consumed a glucose drink were able to sustain their self-control on a second task. In the intervening years the finding has been replicated and the glucose-willpower link has come to be stated as fact.

“No glucose, no willpower,” wrote Baumeister and his journalist co-author John Tierney in their best-selling popular psychology book Willpower: Rediscovering Our Greatest Strength (Allen Lane, 2012). The claim was also endorsed in a guide to willpower published by the American Psychological Association earlier this year. “Maintaining steady blood-glucose levels, such as by eating regular healthy meals and snacks, may help prevent the effects of willpower depletion,” the report claims.

But now two studies have come along at once (following another published earlier in the year) that together cast doubt on the idea that depleted willpower is caused by a lack of glucose availability in the brain. In the first, Matthew Sanders and his colleagues in the US report what they call the “Gargle effect”. They had dozens of students look through a stats book and cross out just the Es, a tiresome task designed to tax their self-control levels. Next, they completed the famous Stroop task – naming the ink colour of words while ignoring their meaning. Crucially, half the participants completed the Stroop challenge while gargling sugary lemonade, the others while gargling lemonade sweetened artificially with Splenda. The participants who gargled, but did not swallow, the sugary (i.e. glucose-containing) lemonade performed much better on the Stroop task.

The participants in the glucose condition didn’t consume the glucose and even if they had, there was no time for it to be metabolised. So this effect can’t be about restoring low glucose levels. Rather, Sanders’ team think glucose binds to receptors in the mouth, which has the effect of activating brain regions involved in reward and self-control – the anterior cingulate cortex and striatum.

The other study that’s just come out was conducted by Martin Hagger and Nikos Chatzisarantis based in Australia and the UK. Their approach was similar to Sanders’ except that participants gargled and spat out a glucose or artificially sweetened solution prior to performing a second taxing task, rather than during. Also, this research involved a series of 5 experiments involving many different ways of testing people’s self-control, including: resisting delicious cookies; reading boring text in an expressive style; unsolvable puzzles; and squeezing hand-grips. But the take-home finding was the same – participants who gargled, but did not swallow, a glucose drink performed better on a subsequent test of their willpower; participants who gargled an artificially sweetened drink did not. So again, willpower was restored without topping up glucose levels. Moreover, the benefit of gargling glucose was displayed only by participants who’d had their self-control taxed in an initial task. It made no difference to participants who were already in an untaxed state.

Hagger and Chatzisarantis agree with the interpretation of the Sanders’ group, except they make a distinction. The effect of glucose binding to receptors in the mouth could either stimulate activity in brain regions like the anterior cingulate that tend to show fatigue after a taxing task. Or they say that glucose in the mouth could trigger reward-related activity that prompts participants to interpret a task as more rewarding, thus boosting their motivation. The explanations are complementary and need not be mutually exclusive.

The key point is the new results suggest depleted willpower is about motivation and the allocation of glucose resources, not about a lack of glucose. These findings don’t prove that consuming glucose has no benefit for restoring willpower, but they suggest strongly that it’s not the principle mechanism. It’s notable that the new findings complement previous research in the sports science literature showing that gargling (without ingesting) glucose can boost cycling performance.

“While our findings are consistent with the predictions of the resource-depletion account, they also contribute to an increasing literature that glucose may not be a candidate physiological analog for self-control resources,” write Hagger and Chatzisarantis. “Instead ego-depletion may be due to problems of self-control resource allocation rather than availability.” An important next step is to conduct brain-imaging and related studies to observe the physiological effects of gargling glucose on the brain, and on motivational beliefs. There are also tantalising applications from the new research – for example, could the gargle effect (perhaps in the form of glucose-infused chewing gum) be used as a willpower aid for dieters and people trying to give up smoking?


Hagger, M., and Chatzisarantis, N. (2012). The Sweet Taste of Success: The Presence of Glucose in the Oral Cavity Moderates the Depletion of Self-Control Resources. Personality and Social Psychology Bulletin DOI: 10.1177/0146167212459912

Sanders, M., Shirk, S., Burgin, C., and Martin, L. (2012). The Gargle Effect: Rinsing the Mouth With Glucose Enhances Self-Control. Psychological Science DOI: 10.1177/0956797612450034

–Further reading–
From The Psychologist: Roy F. Baumeister outlines intriguing and important research into willpower and ego depletion.

Post written by Christian Jarrett for the BPS Research Digest.

Who gets aggressive at the late-night bar and why?

The exhaustive analysis in Steven Pinker’s latest book shows that we are living in the most peaceable age for thousands of years. To anyone who spends time in late-night bars, this might come as a surprise. In these temples to hedonism, spilled drinks and unwelcome gropes all too often provoke violent brawls.

Kathryn Graham and her colleagues trained 148 observers and sent them out to 118 bars in early-hours Toronto where they recorded 1,057 instances of aggression from 1,334 visits. Where the majority of psychology research on aggression is based on laboratory simulations (often involving participants zapping each other with loud noise or spiking each other’s food with chilli sauce), Graham’s team collected real-life observational data to find out who gets aggressive and why.

The researchers followed the Theory of Coercive Actions, according to which aggressive acts have one or more motives: compliance (getting someone to do something, or stop doing something); grievance; social identity (to prove one’s status and power); and thrill-seeking.

Unsurprisingly, the vast majority (77.5 per cent) of aggressive acts were instigated by men. Men more than women were driven to aggression by identity and thrill-seeking motives; by contrast female aggression was more often motivated by compliance and grievance. This often had a defensive intent, as a reaction against unwanted sexual advances.

As well as being particularly severe, aggression that was ignited by patrons who felt threats to their identity was also particularly likely to escalate, “because,” the researchers said, “their strong identity motivation reflects a situation where the person is already invested in winning or besting the other person.” Aggressive acts motivated by grievance were also likely to escalate, because of people feeling their actions were justified.

The researchers found that greater intoxication led to more serious aggression in women, but not men – perhaps because the latter were emboldened enough already. Younger men and bigger men also tended to engage in more serious aggressive acts, replicating past research showing that larger, intoxicated men are more likely to get aggressive than their smaller counterparts.

Graham and her colleagues said their findings could help contribute to preventative policies in late-night bars. For example, given the incendiary role of identity motives in aggressive incidents, efforts could be made to challenge traditional cultural norms that say masculine identity is about power and strength. Because of the escalating effect of grievance motives, security staff could be trained to diffuse situations early – for example, by replacing spilled drinks free of charge. And because so much female aggression was provoked by sexual harassment, the researchers advised establishments to create an atmosphere that discourages “invasive and aggressive sexual overtures whilst still maintaining an exciting venue where young people can explore their sexuality and meet potential partners.”

These recommendations sound well-intentioned and supported by the new evidence, but are they really achievable? What do you think?


Kathryn Graham, Sharon Bernards, D. Wayne Osgood, Michael Parks, Antonia Abbey, Richard B. Felson, Robert F. Saltz, and Samantha Wells (2012). Apparent motives for aggression in the social context of the bar. Psychology of Violence DOI: 10.1037/a0029677

Post written by Christian Jarrett for the BPS Research Digest.

Anonymity may spoil the accuracy of data collected through questionnaires

Thousands of psychology papers are based on data derived from questionnaires that were filled out anonymously. That’s because most psychologists have reasoned that the way to get people to be honest about their practice of undesirable behaviours is to promise them anonymity. But in a new analysis, Yphtach Lelkes and his colleagues point out that anonymity comes with a price. Participants will feel less accountable and may be less motivated to answer questions accurately.

To test this, Lelkes’ team devised a cunning methodology in which dozens of undergrads conducted internet research for what they thought was a study into the way that people search for information on the web.

After each student had spent 45 minutes researching the mountain pygmy-possum, a researcher made a show of deleting the student’s search history before their eyes, ostensibly to prevent the next participant from accessing the browser’s archives. In fact, a spyware programme was installed on the computer and kept track of all the sites visited. After the research session, each student answered a questionnaire about their use of the internet in general and their experience of the internet research task, including which sites they’d searched. Crucially, half the students were instructed to fill out their name and other personal details at the top of the questionnaire; the others were told to leave it blank to ensure anonymity.

Students who answered the questionnaire anonymously admitted to more embarrassing internet behaviours in general, such as looking at porn, but regards their searches specifically during the research task, they answered with less accuracy. There was also evidence of a lack of variety in many of the anonymous students’ later answers, consistent with the idea that they were putting less thought and effort into the questionnaires as they grew tired.

Two follow-up studies involved dozens more students having the opportunity to eat M&M sweets and jelly beans while they completed questionnaires. A question at the end asked them to report how many they’d eaten and once again, students who answered anonymously were less accurate about how much they’d indulged. This was the case whether anonymity was promised before or after the opportunity to eat the snacks.

Lelkes and his colleagues were cautious about how far these findings can be generalised. For example, the same problems might not apply when people are interviewed face-to-face but promised confidentiality. However, they warned researchers against assuming that promising participants anonymity means that they will provide better quality answers. “Particularly among college students who often complete questionnaires to fulfil course requirements, such a guarantee may serve to sanction half-hearted survey completion rather than freeing students up to respond with greater honesty.”


Yphtach Lelkes, Jon A. Krosnick, David M. Marx, Charles M. Judd, and Bernadette Park (2012). Complete anonymity compromises the accuracy of self-reports. Journal of Experimental Social Psychology DOI: 10.1016/j.jesp.2012.07.002

–Further reading– Another recent study found that anonymous web participants provided quality data for psychological experiments.
Our bias for the left-hand side of space could be distorting large-scale surveys.
Is a taste for extreme answers distorting cross-cultural comparisons of personality?

Post written by Christian Jarrett for the BPS Research Digest.

Parents underestimate their children’s worry levels and overestimate their optimism

It’s well-established that parents frequently overestimate their children’s intelligence and the amount of exercise they get. Now a team led by Kristin Lagattuta has uncovered evidence suggesting that parents have an unrealistically rosy impression of their kiddies’ emotional lives too. It’s a finding with important implications for clinicians and child researchers who often rely on parental reports of young children’s psychological wellbeing.

It’s previously been assumed that children younger than seven will struggle to answer questions about their emotions. Undeterred, Lagattuta and her colleagues simplified the language used in a popular measure of older children’s anxiety and they developed a pictorial scoring system that involved the children pointing to rectangles filled with different amounts of colour. Time was taken to ensure the child participants understood how to use the scale.

An initial study with 228 psychologically healthy children aged 4 to 11 from relatively affluent backgrounds found that the children’s answers to oral questions about their experience of worry (including general anxiety, panic, social phobia and separation anxiety) failed to correlate with their parents’ (usually the mother’s) written responses to questions about the children’s experience of worry. Specifically, the parents tended to underestimate how much anxiety their children experienced.

A second study was similar, but this time the researchers ensured the parents and children answered items that were worded in exactly the same way; the parents were reassured that it was normal for children to experience some negative emotion; and the parents were able to place their completed questionnaires in envelopes for confidentiality. Still the children’s answers about their own emotions failed to correlate with parents’ answers, with the parents again underestimating the amount of worry experienced by their children.

A revealing detail in this study was that parents also answered questions about their own emotions. Their scores for their own emotions correlated with the answers they gave for their children’s experiences. “These data suggest that even parents from a low-risk, non-clinical sample may have difficulty separating their emotional perspective from that of their child,” the researchers said.

Finally, 90 more children aged 5 to 10 answered questions about their optimism, whilst their parents also answered questions about their own and their children’s optimism. Again, parents’ and children’s verdicts on the children’s emotions failed to correlate, with the parents now overestimating their children’s experience of optimism. And once more, parents’ own optimism was related to how they interpreted their children’s optimism.

Lagattuta and her colleagues admitted that it’s theoretically possible that the children were the ones showing a distorted view of their own emotions, and it’s the parents who were painting the true picture. However, they think this is highly unlikely. For starters it’s revealing that parents underestimated their children’s negative emotion and yet over-estimated their positive emotion, which argues against the idea that the children were simply answering more conservatively, or giving systematically extreme answers in one direction. Moreover, the new findings fit with the wider literature showing how parents tend to have an unrealistically rosy impression of their children’s wellbeing. An obvious study limitation is the focus on middle class US participants, so there is of course a need to replicate with people from other backgrounds and cultures.

“From the standpoint of research and clinical practice, this mismatch between parent and child perceptions raises a red flag,” the researchers concluded. “Internally consistent self-report data can be acquired from young children regarding their emotional experiences. Obtaining reports from multiple informants – including the child – needs to be the standard.”

Lagattuta KH, Sayfan L, and Bamford C (2012). Do you know how I feel? Parents underestimate worry and overestimate optimism compared to child self-report. Journal of experimental child psychology, 113 (2), 211-32 PMID: 22727673

Post written by Christian Jarrett for the BPS Research Digest.

Most brain imaging papers fail to provide enough methodological detail to allow replication

Amidst recent fraud scandals in social psychology and other sciences, leading academics are calling for a greater emphasis to be placed on the replicability of research. “Replication is our best friend because it keeps us honest,” wrote the psychologists Chris Chambers and Petroc Sumner recently.

For replication to be possible, scientists need to provide sufficient methodological detail in their papers for other labs to copy their procedures. Focusing specifically on fMRI-based brain imaging research (a field that’s no stranger to controversy), University of Michigan psychology grad student Joshua Carp has reported a worrying observation – the vast majority of papers he sampled failed to provide enough methodological detail to allow other labs to replicate their work.

Carp searched the literature from 2007 to 2011 looking for open-access human studies that mentioned “fMRI” and “brain” in their abstracts. Of the 1392 papers he identified, Carp analysed a random sample of 241 brain imaging articles from 68 journals, including PLoS One, NeuroImage, PNAS, Cerebral Cortex and the Journal of Neuroscience. Where an article featured supplementary information published elsewhere, Carp considered this too.

There was huge variability in the methodological detail reported in different studies, and often the amount of detail was woeful, as Carp explains:

“Over one third of studies did not describe the number of trials, trial duration, and the range and distribution of inter-trial intervals. Fewer than half reported the number of subjects rejected from analysis; the reasons for rejection; how or whether subjects were compensated for participation; and the resolution, coverage, and slice order of functional brain images.”

Other crucial detail that was often omitted included information on correcting for slice acquisition timing, co-registering to high-resolution scans, and the modelling of temporal auto-correlations. In all, Carp looked at 179 methodological decisions. To non-specialists, some of these will sound like highly technical detail, but brain imagers know that varying these parameters can make a major difference to the results that are obtained.

One factor that non-specialists will appreciate relates to corrections made for problematic head-movements in the scanner. Only 21.6 per cent of analysed studies described the criteria for rejecting data based on head movements. Another factor that non-specialists can easily relate to is the need to correct for multiple comparisons. Of the 59 per cent of studies that reported using a formal correction technique, nearly one third failed to reveal what that technique was.

“The widespread omission of these parameters from research reports, documented here, poses a serious challenge to researchers who seek to replicate and build on published studies,” Carp said.

As well as looking at the amount of methodological detail shared by brain imagers, Carp was also interested in the variety of techniques used. This is important because the more analytical techniques and parameters available for tweaking, the more risk there is of researchers trying different approaches until they hit on a significant result.

Carp found 207 combinations of analytical techniques (including 16 unique data analysis software packages) – that’s nearly as many different methodological approaches as studies. Although there’s no evidence that brain imagers are indulging in selective reporting, the abundance of analytical techniques and parameters is worrying. “If some methods yield more favourable results than others,” Carp said, “investigators may choose to report only the pipelines that yield favourable results, a practice known as selective analysis reporting.”

The field of medical research has adopted standardised guidelines for reporting randomised clinical trials. Carp advocates the adoption of similar standardised reporting rules for fMRI-based brain imaging research. Relevant guidelines were proposed by Russell Poldrack and colleagues in 2008, although these may now need updating.

Carp said the reporting practices he uncovered were unlikely to reflect malice or dishonesty. He thinks researchers are merely following the norms in the field. “Unfortunately,” he said, “these norms do not encourage researchers to provide enough methodological detail for the independent replication of their findings.”


Carp J (2012). The secret lives of experiments: Methods reporting in the fMRI literature. NeuroImage, 63 (1), 289-300 PMID: 22796459

–Further reading– Psychologist magazine opinion special on replication.
An uncanny number of psychology findings manage to scrape into statistical significance.
Questionable research practices are rife in psychology, survey finds.

Post written by Christian Jarrett for the BPS Research Digest.

Another look at the "magical" benefit of frequent family meals

“The statistics are clear,” Nancy Gibbs wrote in her article for Time magazine in 2006 entitled the Magic of the Family Meal: “Kids who dine with the folks are healthier, happier and better students”. She’s right, there is lots of evidence showing these positive associations, and there are plausible explanations for the benefits, such as a chance for children and parents to talk, and the sense of structure that the ritual provides.

But as Daniel Miller and his colleagues point out in their new study, the supposed benefit of frequent family meals is based on research with limitations. Many studies have been cross-sectional snap-shots in time – so it’s possible that frequent family meals are merely a proxy for other relevant factors, such as warmer family relations or parental wealth and education. And the causal direction could run backwards. Maybe parents are more inclined to dine with children who are happier and better behaved.

Miller’s team have conducted a comprehensive, longitudinal study using data that was collected from 1998 – when 21,400 participating US children were aged 5 years – to 2007, by which time the average age of the remaining 9,700 participants was 13.6. At five time points during that period, the children’s parents were surveyed about how often they ate as a family at breakfast and dinner; the children’s reading and maths abilities were assessed; and teachers were surveyed about the children’s behaviour.

The results were clear – there was little or no evidence (depending on the precise analysis used) of any association between more family meals at earlier time points and better outcomes later, in terms of the children’s academic abilities or good behaviour. “Our results suggest that the findings of previous work regarding frequency of family meals and adolescent outcomes should be viewed with some caution,” the researchers said.

But we shouldn’t be too hasty about dismissing the value of family meals. This study comes with its own caveats. Chief among these is that the children were younger than in most other studies on this issue. Relevant here is that past research has linked frequent family meals with outcomes such as less substance abuse among older teenagers – a potential benefit that was not addressed in this study given the younger sample. Another problem, acknowledged by the researchers, was the reliance on parental reports about the frequency of family meal times. A suspiciously high number of parents reported having family meals every day of the week. If they were lying it could have affected the trustworthiness of the results, although the researchers think this is unlikely based on some checks they made of their data.

Taken altogether, Miller and his colleagues said their study should be seen as “an extension rather than a repudiation of previous work”. Their cautious conclusion is that “the magnitude of the effect of family meal frequency may be less than suggested by previous work.”


Daniel Miller, Jame Waldfogel, and Wen-Jui Han (2012). Family meals and child academic and behavioural outcomes. Child Development : 10.1111/j.1467-8624.2012.01825.x

–Further reading– You are what you eat? Meal type, socio-economic status and cognitive ability in childhood.

Post written by Christian Jarrett for the BPS Research Digest.

Made it! An uncanny number of psychology findings manage to scrape into statistical significance

Like a tired boxer at the Olympic Games, the reputation of psychological science has just taken another punch to the gut. After a series of fraud scandals in social psychology and a US survey that revealed the widespread use of questionable research practices, a paper published this month finds that an unusually large number of psychology findings are reported as “just significant” in statistical terms.

The pattern of results could be indicative of dubious research practices, in which researchers nudge their results towards significance, for example by excluding troublesome outliers or adding new participants. Or it could reflect a selective publication bias in the discipline – an obsession with reporting results that have the magic stamp of statistical significance. Most likely it reflects a combination of both these influences. On a positive note, psychology, perhaps more than any other branch of science, is showing an admirable desire and ability to police itself and to raise its own standards.

E. J. Masicampo at Wake Forest University, USA, and David Lalande at Université du Québec à Chicoutimi, analysed 12 months of issues, July 2007 – August 2008, from three highly regarded psychology journals – the Journal of Experimental Psychology: General; Journal of Personality and Social Psychology; and Psychological Science.

In psychology, a common practice is to determine how probable (p) it is that the observed results in a study could have been obtained if the null hypothesis were true (the null hypothesis usually being that the treatment or intervention has no effect). The convention is to consider a probability of less than five per cent (p < .05) as an indication that the treatment or intervention really did have an influence; the null hypothesis can be rejected (this procedure is known as null hypothesis significance testing).

From the 36 journal issues Masicampo and Lalande identified 3,627 reported p values between .01 to .10 and their method was to see how evenly the p values were spread across that range (only studies that reported a precise figure were included). To avoid a bias in their approach, they counted the number of p values falling into “buckets” of different size, either .01, .005, .0025 or .00125 across the range.

The spread of p values between .01 and .10 followed an exponential curve – from .10 to .01 the number of p values increased gradually. But here’s the key finding – there was a glaring bump in the distribution between .045 and .050. The number of p values falling in this range was “much greater” than you’d expect based on the frequency of p values falling elsewhere in the distribution. In other words, an uncanny abundance of reported results just sneaked into the region of statistical significance.

“Biases linked to achieving statistical significance appear to have a measurable impact on the research publication process,” the researchers said.

The same general pattern was found regardless of whether Masicampo and Lalande analysed results from just one journal or all of them together, and mostly regardless of the size of the distribution buckets they looked at. Of course, there’s a chance the intent behind their investigations could have biased their analyses in some way. To check this, a research assistant completely blind to the study aims analysed p values from one of the journals – the same result was found.

Masicampo and Lalande said their findings pointed to the need to educate researchers about the proper interpretation of null hypothesis significance testing and the value of alternative approaches, such as reporting effect sizes and confidence intervals. ” … [T]he field may benefit from practices aimed at counteracting the single-minded drive toward achieving statistical significance,” they said.


Masicampo EJ, and Lalande DR (2012). A peculiar prevalence of p values just below .05. Quarterly journal of experimental psychology PMID: 22853650

Post written by Christian Jarrett for the BPS Research Digest.