Psychology’s Replication Crisis Is Running Out of Excuses

***Ninshub*** · ***Ninshub*** 2018-11-24, 09:04 PM Administrator

Psychology’s Replication Crisis Is Running Out of Excuses

Another big project has found that only half of studies can be repeated. And this time, the usual explanations fall flat.
Ed Yong
Nov 19, 2018

Quote:That failure rate is especially galling, says Simine Vazire from the University of California at Davis, because the Many Labs 2 teams tried to replicate studies that had made a big splash and been highly cited. Psychologists “should admit we haven’t been producing results that are as robust as we’d hoped, or as we’d been advertising them to be in the media or to policy makers,” she says. “That might risk undermining our credibility in the short run, but denying this problem in the face of such strong evidence will do more damage in the long run.”

#2 · Chris 2018-11-25, 01:02 AM Unregistered

(2018-11-24, 09:04 PM)Ninshub Wrote: Psychology’s Replication Crisis Is Running Out of Excuses

Another big project has found that only half of studies can be repeated. And this time, the usual explanations fall flat.
Ed Yong
Nov 19, 2018

Here's the preprint containing the results of the study:
https://psyarxiv.com/9654g/

#3 · Chris 2018-11-25, 03:23 PM Unregistered

(2018-11-25, 01:02 AM)Chris Wrote: Here's the preprint containing the results of the study:
https://psyarxiv.com/9654g/

The main purpose of the study was to see how heterogeneous the results would be in replication attempts carried out using a common protocol, but in different labs with subjects selected from different populations. That was because such heterogeneity had been suggested as a possible reason for failures to replicate experimental results in psychology.

The study found that only about half the previously published results were replicated with this much larger sample. Of the effects that were still found to be significant, the effect size was only about a quarter of what had been found previously (that comparison was for WEIRD subjects - Western, educated, industrialised, rich, democratic).

For heterogeneity, the finding is that mostly it's only for the larger effect sizes that there is significant heterogeneity between the different subsamples. That is using Cochran's Q statistic, but they have set the bar extremely high for significance - at a p value of 0.001 - and I can't see any discussion of why they've done that. They also calculate another heterogeneity statistic - I squared - and for that they use a more conventional 95% confidence interval. On the basis of their intervals it looks as though there could be quite a bit of heterogeneity between subsamples for all the effects.

But considering that in half the cases the effect disappeared altogether and in the other half it shrank to a quarter of its originally reported size, probably that - and the underlying reasons for it - and not heterogeneity is the explanation for failures to replicate.