Apart from not being able to make head or tail of that particular distinction, I'm afraid I don't find this paper in general very clear or useful.
They seem to think that because they don't find a difference to be statistically significant, they can conclude that there is no difference. For example, they explicitly conclude that dream ESP is independent of laboratory [notably Maimonides versus non-Maimonides], REM monitoring, target type (static or dynamic) and "perhaps" the number of choices in the judging set.
But of course, the lack of statistical significance may just be because the numbers aren't big enough. The fact is that the effect size in the Maimonides studies is well over twice the size of the effect size in the non-Maimonides studies, and when the statistics are analysed in terms of effect size, the p value is 0.055 (two-tailed). Similarly, the effect size for dynamic targets is twice that for static targets, and the p value is 0.068 (one-tailed). Studies with REM monitoring have an effect size 50% bigger than those without. Of course, those values aren't sufficient to conclude definitely that there's a difference. But that doesn't mean it can be concluded that there isn't a difference! They suggest quite strongly that there is.
It doesn't help that as well as testing whether effect sizes were different in different groups, they test whether Z values were different. But seeing that the Z value depends on the study size, and the average study sizes vary between the groups they're comparing, I can't see that that tells them anything useful.
They don't appear to do a comparison between selected and unselected subjects. (Except for the "single subject" versus "multiple perceiver" comparison, which is meant to be something to do with "star subjects". But presumably some of the studies with more than one participant were selective. And perhaps some of the single-participant studies were non-selective?)
It seems to me there most probably is an important difference between the Maimonides and non-Maimonides studies. In the conclusion, they say that two of the authors, Sherwood and Roe (2013), previously concluded that there was such a difference, and attributed it to considerations of procedure rather than quality. But they also say that according to their ratings the Maimonides studies were worse in quality than the non-Maimonides ones (and they want to conclude that most of the procedural differences are unimportant). Whether the previous interpretation is true, or whether the one suggested by this paper is true, strikes me as the single most important issue in relation to these dream studies, but this paper doesn't address it directly at all.
They seem to think that because they don't find a difference to be statistically significant, they can conclude that there is no difference. For example, they explicitly conclude that dream ESP is independent of laboratory [notably Maimonides versus non-Maimonides], REM monitoring, target type (static or dynamic) and "perhaps" the number of choices in the judging set.
But of course, the lack of statistical significance may just be because the numbers aren't big enough. The fact is that the effect size in the Maimonides studies is well over twice the size of the effect size in the non-Maimonides studies, and when the statistics are analysed in terms of effect size, the p value is 0.055 (two-tailed). Similarly, the effect size for dynamic targets is twice that for static targets, and the p value is 0.068 (one-tailed). Studies with REM monitoring have an effect size 50% bigger than those without. Of course, those values aren't sufficient to conclude definitely that there's a difference. But that doesn't mean it can be concluded that there isn't a difference! They suggest quite strongly that there is.
It doesn't help that as well as testing whether effect sizes were different in different groups, they test whether Z values were different. But seeing that the Z value depends on the study size, and the average study sizes vary between the groups they're comparing, I can't see that that tells them anything useful.
They don't appear to do a comparison between selected and unselected subjects. (Except for the "single subject" versus "multiple perceiver" comparison, which is meant to be something to do with "star subjects". But presumably some of the studies with more than one participant were selective. And perhaps some of the single-participant studies were non-selective?)
It seems to me there most probably is an important difference between the Maimonides and non-Maimonides studies. In the conclusion, they say that two of the authors, Sherwood and Roe (2013), previously concluded that there was such a difference, and attributed it to considerations of procedure rather than quality. But they also say that according to their ratings the Maimonides studies were worse in quality than the non-Maimonides ones (and they want to conclude that most of the procedural differences are unimportant). Whether the previous interpretation is true, or whether the one suggested by this paper is true, strikes me as the single most important issue in relation to these dream studies, but this paper doesn't address it directly at all.