Peter Bancel's simulation of Questionable Research Practices

1 Replies, 947 Views

Another preprint by Peter Bancel, entitled "Simulating Questionable Research Practices", has been uploaded to ResearchGate:
https://www.researchgate.net/publication..._Practices

The questionable practices considered include failure to respect the distinction between pilot and confirmatory studies, arbitrary exclusion of trials, optional stopping and extending, multiple analysis, publication bias and outright fraud. He concludes that these practices don't account for the evidence from Ganzfeld experiments, and also says the analysis suggests that they can't explain the heterogeneity of the data from micro-psychokinesis, potentially supporting the genuineness of the phenomenon.

Abstract:
Meta-analysis provides evidence for psi effects across a number of well-established protocols. However, the drawbacks of meta-analysis, which are well-known, can weaken the evidence, particularly for researchers in other disciplines who are not familiar with the parapsychological literature. Moreover, recent scrutiny on the variety and frequency of questionable research practices (Qrps)-methodological problems such as publication bias or unplanned analyses that can lead to spurious effects-has called into question meta-analytical evidence across many fields. A paper by Bierman, Spottiswoode and Bijl uses Monte Carlo simulation to test whether meta-analytic databases can be explained by Qrps alone. The paper is novel in that it attempts to simulate a comprehensive set of Qrps operating simultaneously. This work presents a reformulation of the problem that improves the power of Qrp tests and speeds up simulation times by a factor of roughly 1000. The method is applied to the Ganzfeld database. In addition, it is shown how understanding Qrps can lead to insights about other databases, namely those of micro-PK RNG and Global Consciousness Project (GCP) experiments. Specifically, it is shown that a broad set of Qrps fails to account for the Ganzfeld data, even if these are used in maximal combination and are adopted by researchers at frequencies approaching 100%. Applied to the micro-PK data, it is argued that, although the data are not amenable to full simulation, the Qrp analysis suggests that the heterogeneity cannot be explained by maximal Qrps. If substantiated, this conjecture would supply new support for a micro-PK psi effect and help clarify some of the confusion about this complex database. The GCP is fully pre-registered and hence generally immune to Qrps. Qrp analysis is consistent with this framing. It is indicated how the GCP, while exhibiting a real psi effect, does not provide evidence for its Global Consciousness hypothesis.
[-] The following 1 user Likes Guest's post:
  • Brian
I think this is an interesting attempt to determine whether the significant results indicated by meta-analyses could be explained by a combination of different "questionable research practices" (QRPs). Essentially it's a more elaborate version of previous attempts to estimate the "file drawer effect". Computational methods are used to simulate the effect of several QRPs acting together, and to test whether they could account for the observed results of meta-analyses. The observed results are (1) the frequency distribution of different p-values for the studies covered by the meta-analysis - in this case represented by 6 "bins" covering different ranges of p-values - and (2) other calculated statistics - in this case, only the effect size.

The method works by finding the optimal combination of QRPs, and working out an overall p-value for the hypothesis that the the observed results could be explained by that combination. The smaller this p-value, the more implausible it is that the results can be explained in this way.

The claim is that the way in which the p-value frequency distribution is divided up into "bins", together with the use of other statistics, makes the method more able to discriminate between genuine effects and effects of QRPs. So - in contrast to the conclusion of Bierman, Spottiswoode and Bijl - the overall p-value for the Ganzfeld studies is 0.053, which is almost significant at the 5% level, indicating that, assuming the optimal combination of QRPs, the observed data are unlikely to have arisen by chance.

This seems like a potentially useful technique, but there are a couple of things I am concerned about. One is a technical concern - from the description in the preprint it sounds as though the p-values obtained from the frequencies for different ranges of p-values, and the p-value for the effect size, are combined as though they were statistically independent. I don't see why they should be statistically independent, so it looks as though this may artificially reduce the overall p-value, making the QRPs seem a less plausible explanation of the observed results.

More seriously, when the author says "a broad set of Qrps fails to account for the Ganzfeld data, even if these are used in maximal combination and are adopted by researchers at frequencies approaching 100%", it's important to understand that the 100% frequency doesn't mean the QRPs are attaining their greatest possible size. It means they are attaining the (arbitrary) size deemed to be plausible when modelling the QRPs. For example, the publication bias model is that studies with a p-value of 0.3 or more would be published with a probability of only about 0.5, while studies with a very small p-value would almost always be published, with a smooth transition between the two. That is the maximum amount of publication bias considered in the model. But, of course, in theory there could be a greater degree of publication bias than this.

So ultimately this model still depends on a subjective assessment of how common QRPs are. What would be interesting would be a model which didn't restrict the strength of QRPs in this way. Perhaps that might show that the predictions based on QRPs were still inconsistent with observations. I doubt the current model would be powerful enough to say that. Perhaps Peter Bancel is right that the method could be made more powerful, but I think in that case the issue of statistical dependence between the observed quantities would need to be considered carefully.
[-] The following 1 user Likes Guest's post:
  • Brian

  • View a Printable Version
Forum Jump:


Users browsing this thread: 1 Guest(s)