Call for retraction of "Feeling the Future"

malf · malf 2018-01-10, 05:54 PM

(2018-01-10, 09:38 AM)Desperado Wrote: Well, which do you have your money on, lyace? You've been scouting out around these forums for a long time, and I do see you as a proponent of some kind who is also fairly skeptical. You don't really think that parapsychology has a good chance of just being fraud?

I know you are making a point and technically there is always a possibility of something, but one out of context quote from Bem isn't too significant to me if it is anybody else. When the skeptics try and fudge things up, they have always in the past been found out pretty quickly. I've seen no convincing evidence that, especially amongst the more respected and serious parapsychologists, are just tweaking the data to get the results they wanted. With all the dozens of ways information that such fraudulent activity on that grand of scale could get out easily, I don't find it probable at all.

I mean, there are frauds everywhere, but is parapsychology all it? I can't for the life of me agree to that. Not saying you do, but the fraud card is useless to me at this point.

I wouldn’t jump to fraud. Perhaps if psi has been ‘proven true’ by the standards of current science, it may just be that we need a higher standard.

I think that is what Iyace is saying.

#22 · Chris 2018-01-10, 07:03 PM Unregistered

(2018-01-10, 05:54 PM)malf Wrote: I wouldn’t jump to fraud. Perhaps if psi has been ‘proven true’ by the standards of current science, it may just be that we need a higher standard.

I think we need to be clear that if the suggestion is that Bem ran a large number of experiments and then selected for publication a small fraction of them that were successful, then that does imply that he was not honest in his presentation of the work.

This is what Bem wrote about unpublished work in his paper:

The File Drawer

Like most social-psychological experiments, the experiments reported here required extensive pilot testing. As all research psychologists know, many procedures are tried and discarded during this process. This raises the question of how much of this pilot exploration should be reported to avoid the file-drawer problem, the selective suppression of negative or null results.

This problem arose most acutely in our two earliest experiments, the retroactive habituation studies, because they required the most extensive pilot testing and served to set the basic parameters and procedures for all the subsequent experiments. I can identify three sets of findings omitted from this report so far that should be mentioned lest they continue to languish in the file drawer.

First, several individual-difference variables that had been reported in the psi literature to predict psi performance were pilot tested in these two experiments, including openness to experience; belief in psi; belief that one has had some psi experiences in everyday life; and practicing a mental discipline such as meditation, yoga, self-hypnosis, or biofeedback. None of them reliably predicted psi performance, even before application of a Bonferroni correction for multiple tests. Second, an individual-difference variable (negative reactivity) that I reported as a correlate of psi in my convention presentation of these experiments (Bem, 2003) failed to emerge as significant in the final overall database.

Finally, as also reported in Bem (2003), I ran a small retroactive habituation experiment that used supraliminal rather than subliminal exposures. It was conducted as a matter of curiosity after the regular (subliminal) experiment and its replication had been successfully completed. It yielded chance findings for both negative and erotic trials.

Clearly that is not consistent with Bem having suppressed 90-95% of his data when he came to publish.

malf · malf 2018-01-10, 08:34 PM

(2018-01-10, 07:03 PM)Chris Wrote: I think we need to be clear that if the suggestion is that Bem ran a large number of experiments and then selected for publication a small fraction of them that were successful, then that does imply that he was not honest in his presentation of the work.

This is what Bem wrote about unpublished work in his paper:

The File Drawer

Like most social-psychological experiments, the experiments reported here required extensive pilot testing. As all research psychologists know, many procedures are tried and discarded during this process. This raises the question of how much of this pilot exploration should be reported to avoid the file-drawer problem, the selective suppression of negative or null results.

This problem arose most acutely in our two earliest experiments, the retroactive habituation studies, because they required the most extensive pilot testing and served to set the basic parameters and procedures for all the subsequent experiments. I can identify three sets of findings omitted from this report so far that should be mentioned lest they continue to languish in the file drawer.

First, several individual-difference variables that had been reported in the psi literature to predict psi performance were pilot tested in these two experiments, including openness to experience; belief in psi; belief that one has had some psi experiences in everyday life; and practicing a mental discipline such as meditation, yoga, self-hypnosis, or biofeedback. None of them reliably predicted psi performance, even before application of a Bonferroni correction for multiple tests. Second, an individual-difference variable (negative reactivity) that I reported as a correlate of psi in my convention presentation of these experiments (Bem, 2003) failed to emerge as significant in the final overall database.

Finally, as also reported in Bem (2003), I ran a small retroactive habituation experiment that used supraliminal rather than subliminal exposures. It was conducted as a matter of curiosity after the regular (subliminal) experiment and its replication had been successfully completed. It yielded chance findings for both negative and erotic trials.

Clearly that is not consistent with Bem having suppressed 90-95% of his data when he came to publish.

I think Iyace was considering the field in general, maybe we should wait til he’s back.

Desperado · Desperado 2018-01-11, 02:10 AM

(2018-01-10, 05:54 PM)malf Wrote: I wouldn’t jump to fraud. Perhaps if psi has been ‘proven true’ by the standards of current science, it may just be that we need a higher standard.

Well then my question would be what's wrong with the "current standards of science"? There are flaws in the small sample sizes some areas of soft science rely on in their papers and studies, but the work isn't kept if it isn't replicated. So we definitely need to change sample size so we don't waste so much money on studies that are too low in power to get any meaningful conclusions from.

However, that really isn't a question of the standards of science, but rather the standards of science media. Outlets tend to hype up work that is fresh out the gate and half time tend to exaggerate things from it way farther then the actual researchers do in the paper!

To me:

Study sizes and methods sure need to change, and the standards of some science outlets, need to as well. The standards of science overall on what is accepted? Not so much.

Given what we are talking about, I'm sorry but I see it as just "post shifting" on the behalf of people always looking for a way out of accepting even the possibility that PSI exists. Not pointing fingers, as it doesn't seem anybody here is seriously advancing that idea rather than just suggest it. I'm just saying

#26 · Chris 2018-01-11, 06:37 PM Unregistered

My questions finally appeared in the comments section, and Prof. Schimmack responded to them. I tried to post a reply to one of his responses - the one about estimating the total number of trials that would be required in his scenario - but again I'm not sure it got through. I'll post it here for safety, just in case it vanished into the ether.

My original question:
https://replicationindex.wordpress.com/2...mment-3346

Prof. Schimmack's response:
https://replicationindex.wordpress.com/2...mment-3369

My reply:

Thank you for replying to my questions. I think this is really the crucial one - how many trials/participants would be needed to produce the published results, in your scenario?

I guess you mean 290 per experiment, not 390, for a total of 2,900 trials?

But my problem with your estimate is that a selected significant pilot experiment with N=10 would be unlikely to remain significant if 90 further non-selected trials were added to it (even if the preliminary experiment scored 10 out of 10). To end up with 9 significant N=100 experiments, you would need further stage of selection at the end, in which a significant proportion of the N=100 studies was discarded.

In your article you do note that collectively trials 51-100 are significant, and you see that as an indication that the later trials were also selected. A rough estimate based on the p values you give for trials 51-100 suggests that 50-70% of the studies might have to be discarded. I would guess that percentage would be larger if the range N=11-100 were considered. Of course, that would mean a corresponding multiple of your estimate of 2,900. Taking this into account, I'm not convinced the final answer would be much smaller than your estimate of 18,000 for selection of complete studies.Indeed, it might even be larger.

But in any case, I do think your discovery of a decline effect in these data is very interesting, and I hope ultimately it may well shed important light on what is going on.

#27 · Chris 2018-01-12, 06:34 PM Unregistered

There's still a problem with posting comments to the blog, apparently, so I'll say a bit more here, and then try to post a link on there [Edit: this was successful].

In his article, when discussing possible explanations for Bem's data, in the section entitled "Selective Reporting", Prof. Schimmack estimated the total number of 100-trial experiments that would have been needed to produce (on average) 9 that were significant at p<0.05. As p<0.05 occurs by chance one time in 20, the number of experiments would be 20x9=180, and the total number of trials would be 18,000. (Schimmack doesn't explicitly rule this possibility out, but I think the tenor of his comments suggests he finds it implausible. It seems very implausible to me, as it would involve almost 7 trials a day on every weekday for a decade.)

In his reply to my question, he discusses a scenario in which pilot experiments, each with 10 trials, would be run, and if they were significant at p<0.05 they would be continued with a further 90 trials, but otherwise they would be discarded. To end up with 9 significant 100-trial experiments, Schimmack's argument would again imply 20x9=180 experiments - but now pilots of only 10 trials each - and 9 continuations of 90 trials each, giving a total of 2610 trials. (The figure he gave in his comment was different from that, because apparently there was an arithmetic slip, and the calculation was for 10 significant trials, whereas Bem had 9.)

The problem with this estimate is that it implicitly assumes that if a pilot experiment of 10 trials is significant at p<0.05, then it will remain significant when 90 continuation trials are added to make a total of 100. Obviously that's not the case. Only a certain proportion of the selected pilot experiments would remain significant. The figure of 2610 is therefore an underestimate of the total number of trials required in this scenario.

If we assume all the trials are just binary choices with equal chances of success and failure (as most of Bem's were), then we can use the binomial distribution to allow for the loss of significance during the continuation phase, and work out the correct number of trials required. So that anyone interested can check my calculation, the details are given below. In doing the calculation, I have slightly relaxed the criterion of significance for the pilot phase, to p<0.055. (Unless this is done, the criterion requires 9 or more successes out of 10, which occurs in only about 1% of cases, and that pushes the total number of trials required up above 50,000.)

The answer is that in total about 18,400 trials are needed for the two-phase selection process involving pilot experiments with N=10.

Although this is similar to the figure in Schimmack's article for the scenario in which full experiments of 100 trials are run and then selected for significance, it should really be compared with the result of an exact binomial calculation for that case (because significance requires 59 or more successes out of 100, which happens only about 4.4% of the time). That gives a larger figure - about 20,300 trials. So the two-phase selection process is slightly more economical in terms of trials, but by only about 10%.

It might be asked whether a different choice for the size of the pilot experiments would reduce the number of trials required. But a corresponding calculation for pilots with 20 trials instead of 10 (with a similar tweak of the significance criterion in the pilot phase, to p<0.058), gives a similar figure of about 20,400 trials.

Therefore it seems that Schimmack's scenario, involving the selection of pilot experiments, would involve no significant saving in terms of the total number of trials, and therefore - to my mind - does not offer a plausible explanation for Bem's results.

....................................................................................

[Binomial probabilities from http://stattrek.com/online-calculator/binomial.aspx]

For pilot experiments of 10 trials, with significance criterion p<0.055, there are three possible outcomes for significance:
(a) 10 out of 10, with probability 0.000977.
(b) 9 out of 10, with probability 0.009766.
(c) 8 out of 10, with probability 0.043945.
The total probability of significance for the pilot experiment is therefore 0.05469.

For significance at p<0.05 in a full experiment of 100, we require 59 or more successful trials (the probability of 59 or more successes is 0.04431).

So in these three cases, for the continuations of 90 trials, the possible outcomes for significance are:
(a) 49 or more out of 90, with probability 0.230396.
(b) 50 or more out of 90, with probability 0.171417.
(c) 51 or more out of 90, with probability 0.123053.
Forming the products of these with the probabilities above and adding the products, we find the total probability of significance in both phases is 0.007307.

Scaling these numbers up, the number of experiments needed for the expected number of significant results to be 1, is 136.9 pilots of 10 trials and 7.485 continuations of 90 trials, totalling 2043 trials.

Therefore for an expectation of 9 significant results, the total number of trials required would be about 18,400.

#28 · Chris 2018-01-13, 02:14 PM Unregistered

I think this will be the final comment I post on Prof. Schimmack's blog. It's currently awaiting moderation, so I'll post it here too:

Thank you for your reply.

As I mentioned, an outline of my calculation is on the page I linked to, so that the details can be seen if they are of interest, and checked if necessary. It wasn't really a simulation, but an exact calculation based on the binomial probabilities. I calculated the number of pilot experiments that would be needed to make the expected number of significant full experiments (at p<0.05) equal to 9. I then calculated the expected total number of trials corresponding to that number of pilot experiments (not conditioned on the end result). The significance criteria were one-tailed, as you had suggested.

The only change I made to your prescription - in an attempt to be as fair as possible to your hypothesis - was a slight relaxation of the significance criterion for the pilot experiments, to p<0.055. That meant that a result of 8 or more successes out of 10 was counted as significant. (Applying p<0.05 strictly would have meant that 9 or more successes were required, which would actually represent p<0.01, and would result in the total number of trials required being more than 50,000.)

I agree that we don't need calculations to tell us that the significance of pilot experiments will usually be diluted by 90 further trials. The point was that your illustrative calculation didn't take that effect into account, and therefore underestimated the number of trials required. I did the calculation to find out how serious the underestimate was.

In addition to considering the parameters you had suggested, I did also consider pilot studies with 20 trials (similarly relaxing the criterion for pilot studies to p<0.058), but that calculation indicated that even more trials - about 20,400 - would be required than for the 10-trial pilots.

The figure can be reduced somewhat by relaxing the significance criterion for the pilot studies further. For 10-trial pilot experiments, the optimal p value is 0.17. That implies that about 13,300 trials would be required. But I don't think such a large p value would be consistent with your finding that, considering the first 15 trials, 9 out of the 10 experiments had reached significance (presumably meaning p<0.05) by that point, after 7.75 trials on average.

As to whether we need to assume there is a real effect, when I think about these matters I try to steer clear of unnecessary assumptions, either pro or con. The purpose of the calculations was to gauge whether the scenario you described in your article provided a plausible explanation for Bem's results. On the numbers, I don't think it does. No doubt other explanations will be suggested in the future, as they have been in the past, but my feeling is that the strong decline effect you have shown makes explanations in non-paranormal terms more difficult, not easier.

However, I do feel the decline effect is interesting and important, so I congratulate you on discovering it, and thank you for making it known.

Iyace · Iyace 2018-01-14, 05:52 PM

(2018-01-10, 09:38 AM)Desperado Wrote: Well, which do you have your money on, lyace? You've been scouting out around these forums for a long time, and I do see you as a proponent of some kind who is also fairly skeptical. You don't really think that parapsychology has a good chance of just being fraud?

I know you are making a point and technically there is always a possibility of something, but one out of context quote from Bem isn't too significant to me if it is anybody else. When the skeptics try and fudge things up, they have always in the past been found out pretty quickly. I've seen no convincing evidence that, especially amongst the more respected and serious parapsychologists, are just tweaking the data to get the results they wanted. With all the dozens of ways information that such fraudulent activity on that grand of scale could get out easily, I don't find it probable at all.

I mean, there are frauds everywhere, but is parapsychology all it? I can't for the life of me agree to that. Not saying you do, but the fraud card is useless to me at this point.

Well, for starters, I haven’t been ‘scouting around these forums’. My positions are very well held on the old skeptiko forum of which I contributed a bit, and even more so on the old mind-energy forum ( the first skeptiko forum ) where I contributed heavily, including analysis of various study and protocols ( ganzfeld, presentiment, etc.).

Never once did I mention fraud as being a way to account for the results. Parapsychology uses all standard techniques, and in some cases more granular ones, to demonstrate the efficacy of psi. Either parapsychologists are on to something, or those standard techniques have been crafting false positives for decades and need to be re-evaluated. If I thought it was fraud, I wouldn’t claim the techniques would need to be changed.

#30 · Chris 2018-01-19, 02:59 PM Unregistered

Prof. Schimmack said he was waiting for further information from Daryl Bem, and our discussion ended with an implicit agreement that it would be interesting to investigate his data further.

Now I notice that some more comments have been posted under the name Linda, raising further concerns about Bem's experiments, essentially along the lines of multiple hypotheses. Similar concerns were discussed on Skeptiko a couple of years ago. My feeling is that - as the existence of multiple hypotheses doesn't tend to produce a decline effect - the strong decline effect discovered by Schimmack makes it much more difficult to explain Bem's data in these terms:
https://replicationindex.wordpress.com/2...mment-3397