To continue (on the basis that this is a useful exercise, at least for me).
Going on to the more specific criticisms of the papers by Mossbridge et al., made by Schwarzkopf in 2014, these are to be found briefly in a published paper and at greater length in a blog post:
https://www.frontiersin.org/articles/10....00332/full
https://figshare.com/articles/Why_presen...ed/1021480
Actually, these also include some other general arguments. For example, that precognition breaks the second law of thermodynamics, or the principles of theoretical physics. Or that statistics can demonstrate only a difference from the null hypothesis, not what the underlying process is. Or that if presentiment existed the entire body of evidence in neuroscience would have to be re-analysed. Obviously the first of these has been debated quite a lot, including by physicists (Schwarzkopf is a psychologist), and I don't think the others cast any doubt on the reality of the phenomenon in themselves.
On the presentiment meta-analysis, Schwarzkopf makes two points I broadly agree with:
(1) A meta-analysis is only as good as the primary studies it's based on. He criticises the quality of one in particular, by Bierman and Scholte, though apparently he slips up in suggesting that non-peer-reviewed conference proceedings were also included. (In their response, Mossbridge et al. say all the conference proceedings were peer-reviewed.) But obviously it's right that all meta-analyses are vulnerable to problems with the original studies (just as they are vulnerable to selective reporting).
(2) A potential problem with the presentiment studies is that the physiological variables measured just before each stimulus depend on all the previous stimuli, and this can lead to statistical artefacts (usually referred to as expectation bias in the literature). These effects have been estimated by modelling, but Schwarzkopf doesn't think this is satisfactory. He favours trying to measure the effects directly. (I agree that modelling is unsatisfactory, but I think attempts at direct measurement are too. I think what's needed is either to modify the experimental protocol, or to analyse the data in a way that eliminates the bias.)
Unfortunately most of Schwarzkopf's other points about the statistics don't seem well founded to me:
(1) Schwarzkopf suggests including data from conventional studies which may show the same effect. But the problem is that if such studies use a counterbalanced design, in which the nature of later stimuli is predictable from earlier ones, then that can worsen expectation bias. Schwarzkopf tries to dismiss these concerns, claiming that there is no problem unless subjects are aware of the counterbalanced design.
But as Mossbridge et al. point out in their response, that's a fallacy. There will generally be a bias in the subjects' response based on past stimuli, even if they know nothing of the experimental design. If in addition the nature of the later stimuli depends on the earlier ones, rather than being independently randomised, then this dependence, combined with the bias in the subjects' response, can produce a spurious difference between calm and arousing trials.
(2) Schwarzkopf thinks that because there is generally a larger proportion of calm trials than arousing trials, subjects will come to expect calm trials, and will be right most of the time. So the nature of the stimuli will not really be unpredictable.
That's obviously a fallacy. The experiment doesn't measure whether subjects can predict the nature of the trial more than 50% of the time. It measures the difference between the subjects' response before arousing stimuli and the response in their absence. An overall expectation that all the trials are likelier to be calm than arousing clearly can't produce a significant difference of that kind.
(3) The effects measured could be artifacts of signal filtering during pre-processing.
This possibility was discussed in the original meta-analysis, where it was noted that high-pass filtering could produce a pre-stimulus artifact, but that it would be in the opposite direction to that expected for presentiment. It's stated that only two of the studies included in the meta-analysis used high-pass filtering, and only one of the filters would be vulnerable to such an artifact. It's not clear whether Schwarzkopf had any particular reason to think that filtering artifacts might be a problem in the other studies.
(4) It has been suggested that the problem of expectation bias could be eliminated by running only one trial per subject. Schwarzkopf suggests there would still be expectation bias, because of random differences between the subjects selected for the trials.
That's not what's normally meant by expectation bias in this context. Differences between subjects would have no tendency to produce significant differences between the responses to randomly selected calm and arousing stimuli, so there would be no bias in that sense.
(5) Schwarzkopf expresses concern that sometimes the baseline for measurements is established using a single point rather than a range of values, leading to greater variability of the data, and that sometimes a baseline is defined on a trial-by-trial basis, so that its position can be influenced by a residual response to the previous trial.
The objection to the method of fixing a single baseline for a whole session of trials is hard to understand, because typically what is analysed is the average difference between calm and arousing trials in each session, which will be independent of the baseline. Even if another analysis method were used in which the position of the baseline mattered - and if the variability of the data were increased - the effect of that would not be to produce a spurious different between calm and arousing trials, but to tend to obscure any genuine difference.
Certainly if trial-by-trial baselines were used, and the position of the baseline were influenced by the response to the preceding trial, that would be undesirable. But it would be only a particular example of the general dependence of the response to each trial on all the preceding trials (as in expectation bias). Regardless of how baselines are chosen, the effect of that dependence needs to be eliminated, whether by using an estimate of its size, by modifying the protocol to use single trials, or by using an appropriate analysis technique. (It seems that trial-by-trial baselines were used in only 2 of the 26 studies included in the meta-analysis, though for some reason Schwarzkopf describes these two as "many of these studies".)
Going on to the more specific criticisms of the papers by Mossbridge et al., made by Schwarzkopf in 2014, these are to be found briefly in a published paper and at greater length in a blog post:
https://www.frontiersin.org/articles/10....00332/full
https://figshare.com/articles/Why_presen...ed/1021480
Actually, these also include some other general arguments. For example, that precognition breaks the second law of thermodynamics, or the principles of theoretical physics. Or that statistics can demonstrate only a difference from the null hypothesis, not what the underlying process is. Or that if presentiment existed the entire body of evidence in neuroscience would have to be re-analysed. Obviously the first of these has been debated quite a lot, including by physicists (Schwarzkopf is a psychologist), and I don't think the others cast any doubt on the reality of the phenomenon in themselves.
On the presentiment meta-analysis, Schwarzkopf makes two points I broadly agree with:
(1) A meta-analysis is only as good as the primary studies it's based on. He criticises the quality of one in particular, by Bierman and Scholte, though apparently he slips up in suggesting that non-peer-reviewed conference proceedings were also included. (In their response, Mossbridge et al. say all the conference proceedings were peer-reviewed.) But obviously it's right that all meta-analyses are vulnerable to problems with the original studies (just as they are vulnerable to selective reporting).
(2) A potential problem with the presentiment studies is that the physiological variables measured just before each stimulus depend on all the previous stimuli, and this can lead to statistical artefacts (usually referred to as expectation bias in the literature). These effects have been estimated by modelling, but Schwarzkopf doesn't think this is satisfactory. He favours trying to measure the effects directly. (I agree that modelling is unsatisfactory, but I think attempts at direct measurement are too. I think what's needed is either to modify the experimental protocol, or to analyse the data in a way that eliminates the bias.)
Unfortunately most of Schwarzkopf's other points about the statistics don't seem well founded to me:
(1) Schwarzkopf suggests including data from conventional studies which may show the same effect. But the problem is that if such studies use a counterbalanced design, in which the nature of later stimuli is predictable from earlier ones, then that can worsen expectation bias. Schwarzkopf tries to dismiss these concerns, claiming that there is no problem unless subjects are aware of the counterbalanced design.
But as Mossbridge et al. point out in their response, that's a fallacy. There will generally be a bias in the subjects' response based on past stimuli, even if they know nothing of the experimental design. If in addition the nature of the later stimuli depends on the earlier ones, rather than being independently randomised, then this dependence, combined with the bias in the subjects' response, can produce a spurious difference between calm and arousing trials.
(2) Schwarzkopf thinks that because there is generally a larger proportion of calm trials than arousing trials, subjects will come to expect calm trials, and will be right most of the time. So the nature of the stimuli will not really be unpredictable.
That's obviously a fallacy. The experiment doesn't measure whether subjects can predict the nature of the trial more than 50% of the time. It measures the difference between the subjects' response before arousing stimuli and the response in their absence. An overall expectation that all the trials are likelier to be calm than arousing clearly can't produce a significant difference of that kind.
(3) The effects measured could be artifacts of signal filtering during pre-processing.
This possibility was discussed in the original meta-analysis, where it was noted that high-pass filtering could produce a pre-stimulus artifact, but that it would be in the opposite direction to that expected for presentiment. It's stated that only two of the studies included in the meta-analysis used high-pass filtering, and only one of the filters would be vulnerable to such an artifact. It's not clear whether Schwarzkopf had any particular reason to think that filtering artifacts might be a problem in the other studies.
(4) It has been suggested that the problem of expectation bias could be eliminated by running only one trial per subject. Schwarzkopf suggests there would still be expectation bias, because of random differences between the subjects selected for the trials.
That's not what's normally meant by expectation bias in this context. Differences between subjects would have no tendency to produce significant differences between the responses to randomly selected calm and arousing stimuli, so there would be no bias in that sense.
(5) Schwarzkopf expresses concern that sometimes the baseline for measurements is established using a single point rather than a range of values, leading to greater variability of the data, and that sometimes a baseline is defined on a trial-by-trial basis, so that its position can be influenced by a residual response to the previous trial.
The objection to the method of fixing a single baseline for a whole session of trials is hard to understand, because typically what is analysed is the average difference between calm and arousing trials in each session, which will be independent of the baseline. Even if another analysis method were used in which the position of the baseline mattered - and if the variability of the data were increased - the effect of that would not be to produce a spurious different between calm and arousing trials, but to tend to obscure any genuine difference.
Certainly if trial-by-trial baselines were used, and the position of the baseline were influenced by the response to the preceding trial, that would be undesirable. But it would be only a particular example of the general dependence of the response to each trial on all the preceding trials (as in expectation bias). Regardless of how baselines are chosen, the effect of that dependence needs to be eliminated, whether by using an estimate of its size, by modifying the protocol to use single trials, or by using an appropriate analysis technique. (It seems that trial-by-trial baselines were used in only 2 of the 26 studies included in the meta-analysis, though for some reason Schwarzkopf describes these two as "many of these studies".)