Call for retraction of "Feeling the Future"

#1 · Chris 2018-01-07, 12:50 PM Unregistered

Courtesy of the SPR Facebook page, here's a new post on the "Replicability Index", entitled "Why the Journal of Personality and Social Psychology Should Retract ... “Feeling the Future ...” by Daryl J. Bem":
https://replicationindex.wordpress.com/2...gnition-a/

#2 · Chris 2018-01-07, 07:29 PM Unregistered

Here's the comment I just tried to post on the blog:

I found this article much more reasonable than most of the criticisms of Daryl Bem's paper, but I'm puzzled by several things.

(1) You mention 10 attempted replications as collectively unsuccessful, but the meta-analysis you cite - Bem et al. (2015) - analysed 80 attempted replications, and claimed that as a whole they were successful even when Bem's own studies were excluded. Perhaps there's a reason you don't accept this claim, but shouldn't it be discussed?

(2) In your discussion of selective reporting as a potential explanation for Bem's results, you estimate that this would require 18,000 participants in total (95% of whose results would have to be discarded). This does seem implausible. But have you attempted to calculate corresponding figures for your alternative scenario of running large numbers of pilot studies and continuing only the most promising ones? Obviously this wouldn't require as many participants as running large numbers of full studies, but it seems to me the requirement would be of the same order. Particularly to get an end result of 9 significant studies out of 10, because the promising results of the pilot studies would be diluted by the chance results of the continuations, so a more stringent criterion than 5% significance would need to be applied to the pilots. I wonder whether this explanation is really much more feasible than the other possibilities you reject.

(3) It's not really true to say that Bem didn't answer the question about what pilot explorations should be reported. Immediately after the paragraph you quote, he listed the three pilot explorations which he thought should be mentioned in the 2011 paper. Two concerned psychological variables that had been analysed in relation some of the trials reported in the paper, but were later dropped. Apparently only one concerned trials which weren't reported in the 2011 paper - "a small retroactive habituation experiment that used supraliminal rather than subliminal exposures", which had been reported in 2003.

I was concerned to see the comments attributed to Bem in the 2017 online article by Engber, and find them difficult to reconcile with what he wrote in 2011. As you have been in contact with Bem, have you checked whether he acknowledges the accuracy of those quoted comments, and if so why he didn't feel it was relevant to mention the unsuccessful abandoned experiments? Surely we have to be clear that if he carried out and discarded many more unsuccessful trials than he published - as you envisage - then his presentation of the work in 2011 was not an honest one?

***Laird*** · ***Laird*** 2018-01-08, 06:27 AM Administrator

Thanks for bringing this to our attention, Chris. I struggled to make sense of the author's analysis. It seemed contradictory. His key finding seemed to be, under section 3, "The Decline Effect and a New Questionable Research Practice", that "The most plausible explanation is that successes are a selected set of findings out of many attempts that were not reported". This, though, directly contradicts the author's earlier dismissal of selective reporting as a plausible explanation of Daryl Bem's results. Section 2.8, "Selective reporting", is one of the "questionable research practices" referenced in the generalised conclusion, 2.9, in which the author writes, "In conclusion, none of the questionable research practices [i.e. including the possibility of selective reporting raised in section 2.8 --Laird] that have been identified by John et al. seem to be plausible explanations for Bem’s results".

So... selective reporting both is and isn't a plausible explanation for Daryl Bem's results? Is this a paradox that you can resolve for me, Chris?

(2018-01-07, 07:29 PM)Chris Wrote: You mention 10 attempted replications as collectively unsuccessful, but the meta-analysis you cite - Bem et al. (2015) - analysed 80 attempted replications, and claimed that as a whole they were successful even when Bem's own studies were excluded.

Clearly, this falls under section 2.8, "Selective reporting". I call for the author to retract his article calling for a retraction!

#4 · Chris 2018-01-08, 08:57 AM Unregistered

(2018-01-08, 06:27 AM)Laird Wrote: Thanks for bringing this to our attention, Chris. I struggled to make sense of the author's analysis. It seemed contradictory. His key finding seemed to be, under section 3, "The Decline Effect and a New Questionable Research Practice", that "The most plausible explanation is that successes are a selected set of findings out of many attempts that were not reported". This, though, directly contradicts the author's earlier dismissal of selective reporting as a plausible explanation of Daryl Bem's results. Section 2.8, "Selective reporting", is one of the "questionable research practices" referenced in the generalised conclusion, 2.9, in which the author writes, "In conclusion, none of the questionable research practices [i.e. including the possibility of selective reporting raised in section 2.8 --Laird] that have been identified by John et al. seem to be plausible explanations for Bem’s results".

So... selective reporting both is and isn't a plausible explanation for Daryl Bem's results? Is this a paradox that you can resolve for me, Chris?

I think he's essentially considering two different kinds of selective reporting - one in which a series of complete studies is performed, and then the statistically significant ones are selected for publication, and another in which a series of smaller pilot studies is performed, and only the most successful ones are continued as full studies, while the others are abandoned and left unpublished.

My attempt to post a comment last night doesn't seem to have worked. I've just tried again, and it's now "awaiting moderation", so we'll see what the response is.

I think the discovery of a strong decline effect within Bem's data is very significant in itself, and difficult to reconcile with a lot of the suggestions as to how his results could have been obtained from questionable research practices.

The author of the blog is Professor Ulrich Schimmack, a psychologist at the University of Toronto Mississauga, by the way:
https://www.utm.utoronto.ca/psychology/f...ack-ulrich

***Laird*** · ***Laird*** 2018-01-08, 09:16 AM Administrator

(2018-01-08, 08:57 AM)Chris Wrote: I think he's essentially considering two different kinds of selective reporting - one in which a series of complete studies is performed, and then the statistically significant ones are selected for publication, and another in which a series of smaller pilot studies is performed, and only the most successful ones are continued as full studies, while the others are abandoned and left unpublished.

Hmm. So, sort of an "optional stopping/snooping" strategy (per section 2.7), except in reverse: rather than stopping when you have a significant result, stopping when your pilot study finds a non-significant result, and then trying another study to find a significant result?

(2018-01-08, 08:57 AM)Chris Wrote: My attempt to post a comment last night doesn't seem to have worked. I've just tried again, and it's now "awaiting moderation", so we'll see what the response is.

Best of success. It deserves to be published.

(2018-01-08, 08:57 AM)Chris Wrote: I think the discovery of a strong decline effect within Bem's data is very significant in itself, and difficult to reconcile with a lot of the suggestions as to how his results could have been obtained from questionable research practices.

Agreed. Ulrich writes in his section 1, "Luck", that "The luck hypothesis assumes that Bem got lucky 9 out of 10 times with a probability of 5% on each attempt.
The probability of this event is very small. To be exact, it is 0.000000000019 or 1 out of 53,612,565,445".

Along similar lines, I wonder how many experiments would need to be attempted (including discarded failures) in order to obtain as many as Daryl did which presented as strong a decline effect as those which he published (assuming a random distribution).

(2018-01-08, 08:57 AM)Chris Wrote: The author of the blog is Professor Ulrich Schimmack, a psychologist at the University of Toronto Mississauga, by the way:
https://www.utm.utoronto.ca/psychology/f...ack-ulrich

Well, I hope he's happy. ;-)

#6 · Chris 2018-01-08, 05:23 PM Unregistered

(2018-01-08, 09:16 AM)Laird Wrote: Best of success. It deserves to be published.

No sign of it appearing yet, so I'll carry on commenting here.

As I said, I do think Schimmack has discovered something quite interesting - and perhaps quite important - about Bem's data. The presence of a strong decline effect appears to be inconsistent with several of the suggestions as to how questionable research practices could have produced Daryl Bem's results. I think that includes a form of optional stopping, which I had thought might have been a factor. (As far as I can see, optional stopping would have the opposite effect to the one Schimmack has observed.)

However, the data presented by Schimmack also appear to argue against his own proposed explanation - that Bem might have carried out many small pilot studies, discarded the unsuccessful ones, and continued the successful ones as full studies. The idea is that although the continuation trials would have scored only at chance, the initial selection of successful pilot studies would bias the overall results towards success.

As I wrote in my comment, I suspect that strategy might still require an infeasibly large number of pilot studies to pick and choose from. But whether feasible or not from this point of view, the idea can be tested for consistency with the data in Schimmack's Table 1, in which he presents separate p values for each experiment and for trials 1-50 and 51-100. For the first 9 experiments, for which there were at least 100 trials, this shows that for trials 1-50, 7 out of 9 were significant at 5%, but for trials 51-100 none was significant. So far so good for Schimmack's idea.

However, these p values also allow us to combine together the results from trials 51-100 for all 9 experiments. In Schimmack's scenario, the results of the later trials should be close to chance, because it's only the early trials in the pilot studies that would be selected according to how successful they were. But when I combined the p values for all trials numbered 51-100 (using Fisher's method), I came up with a p value of 0.02 - still statistically significant. That is not what would be expected in Schimmack's scenario.

In fact, given the strong decline effect Schimmack has discovered - combined with a smaller but still-significant effect in the later trials - I think the explanation of Bem's results in non-paranormal terms becomes a real challenge for the sceptics.

***Laird*** · ***Laird*** 2018-01-08, 05:39 PM Administrator

(2018-01-08, 05:23 PM)Chris Wrote: In fact, given the strong decline effect Schimmack has discovered - combined with a smaller but still-significant effect in the later trials - I think the explanation of Bem's results in non-paranormal terms becomes a real challenge for the sceptics.

Oh dear. Has somebody proved the opposite of that which they were intending to prove?

ersby · ersby 2018-01-08, 06:51 PM

I can’t say I’m hugely surprised by all of this. Notwithstanding the knee-jerk rejection from certain scientists who instantly reject any evidence in favour of psi, Bem has written material that effectively hands his opponents enough rope to hang him with.

There’s an article on his own website where he writes about how to get scientific papers published

“... at least become intimately familiar with the record of their [the subjects of research] behavior: the data. Examine them from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something – anything – interesting”

http://dbem.org/WritingArticle.pdf

I believe this is pretty common in science, to look for patterns in the data and then write as if that was what you were looking for all along, but coming from the pen of someone trying to prove something as distasteful as ESP, it’s gold dust to some commentators.

Quite apart from the debate over whether or not psi exists, I don’t think that the author of this piece has done enough to show that Bem’s paper deviates from typical practices in psychology enough to warrent a retraction. At least, not without taking down a lot of other papers with it.

#9 · Chris 2018-01-08, 07:08 PM Unregistered

(2018-01-08, 06:51 PM)ersby Wrote: There’s an article on his own website where he writes about how to get scientific papers published

“... at least become intimately familiar with the record of their [the subjects of research] behavior: the data. Examine them from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something – anything – interesting”

http://dbem.org/WritingArticle.pdf

Yes - that paragraph is very frequently quoted by Bem's critics. But they never quote the following paragraph of that paper, in which he draws a clear distinction between exploratory and confirmatory studies, and makes it clear that he has just been talking about the former:
No, this is not immoral. The rules of scientific and statistical inference that we overlearn in graduate school apply to the “Context of Justification.” They tell us what we can conclude in the articles we write for public consumption, and they give our readers criteria for deciding whether or not to believe us. But in the “Context of Discovery,” there are no formal rules, only heuristics or strategies. How does one discover a new phenomenon? Smell a good idea? Have a brilliant insight into behavior? Create a new theory? In the confining context of an empirical study, there is only one strategy for discovery: exploring the data.

I find the comments attributed to Bem in the online Slate article more worrying. But they seem so inconsistent with what he wrote in the paper about pilot explorations in the 2011 paper, that I wonder if they too may have been taken out of context.

Oleo · Oleo 2018-01-09, 01:02 AM

Can someone explain how pilot study shopping? Can effect results without fudging the data?

Call for retraction of "Feeling the Future"

Chris

Chris

Laird

Chris

Laird

Chris

Laird

ersby

Chris

Oleo