Psience Quest

Full Version: Call for retraction of "Feeling the Future"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6

Chris

(2018-01-21, 07:58 AM)ersby Wrote: [ -> ]There's a new post on that blog about his email exchange with Bem.

https://replicationindex.wordpress.com/2...he-future/

Thanks for drawing this to our attention. I see this includes the data files for the experiments published in "Feeling the Future", which Daryl Bem has agreed to make freely available for analysis, and detailed information in the emails about the timing and composition of the experiments. 

There is also a comment from Bem about the issue of pilot experiments and selective reporting, flatly contradicting some of the suggestions that have been made:

"(One minor point: I did not spend $90,000 to conduct my experiments.  Almost all of the participants in my studies at Cornell were unpaid volunteers taking psychology courses that offered (or required) participation in laboratory experiments.  Nor did I discard failed experiments or make decisions on the basis of the results obtained.)

What I did do was spend a lot of time and effort preparing and discarding early versions of written instructions, stimulus sets and timing procedures.  These were pretested primarily on myself and my graduate assistants, who served repeatedly as pilot subjects. If instructions or procedures were judged to be too time consuming, confusing, or not arousing enough, they were changed before the formal experiments were begun on “real” participants.  Changes were not made on the basis of positive or negative results because we were only testing the procedures on ourselves.

When I did decide to change a formal experiment after I had started it, I reported it explicitly in my article. In several cases I wrote up the new trials as a modified replication of the prior experiment.  That’s why there are more experiments than phenomena in my article:  2 approach/avoidance experiments, 2 priming experiments, 3 habituation experiments, & 2 recall experiments.)"

Furthermore, it sounds as though Schimmack has plans to write up his analysis for formal publication, and as though a separate article on Bem's study 6 is about to appear on his blog.

It feels as though more useful information about Bem's experiments has emerged in the last couple of weeks than in the previous six years of sterile wrangling - particularly the decline effect. Having said that, Bem himself seems more interested in organising pre-registered studies in the future, than in reconsidering his original work.

Chris

A note of caution about the calculations I posted of the required number of trials to produce 9 significant experiments:

Looking again at Bem’s results, of course the number of binary trials isn’t equal to the number of participants, but is some multiple of it. Therefore it wasn’t appropriate to use the exact binomial probabilities in my calculations. It would have been better to use the normal distribution as an approximation, in working out the probability of statistical significance being maintained in going from a pilot experiment to a completed experiment. I believe the results of the calculations I posted should be roughly correct, but they shouldn’t be taken as exact.

Chris

(2018-01-21, 10:10 AM)Chris Wrote: [ -> ]A note of caution about the calculations I posted of the required number of trials to produce 9 significant experiments:

Looking again at Bem’s results, of course the number of binary trials isn’t equal to the number of participants, but is some multiple of it. Therefore it wasn’t appropriate to use the exact binomial probabilities in my calculations. It would have been better to use the normal distribution as an approximation, in working out the probability of statistical significance being maintained in going from a pilot experiment to a completed experiment. I believe the results of the calculations I posted should be roughly correct, but they shouldn’t be taken as exact.

I checked how sensitive the calculation was to the total number of binary trials, by comparing:
(i) pilot experiment of 10 binary trials and full study of N=100, and
(ii) pilot experiment of 20 binary trials and full study of N=200.

The answer is that most of the sensitivity comes from the fact that in a discrete distribution the p=0.05 criterion doesn't correspond to a whole number of successes, so in effect the 0.05 value has to be raised or lowered a bit. Once that effect has been factored out, the probability of a significant pilot experiment remaining significant when extended to a full experiment, depends only very weakly on the total number of trials N. (The relative difference in the value between N=100 and N=200 is only just over 1%.)

But probably in estimating the required number of participants it would be fair to remove the effect of the discrete distributions on the p=0.05 criterion. That would have the effect of lowering somewhat the estimate of the total number of participants required, say by roughly 5% - from 18,000 to 17,000 or so.

Chris

(2018-01-21, 09:05 AM)Chris Wrote: [ -> ]Thanks for drawing this to our attention. I see this includes the data files for the experiments published in "Feeling the Future", which Daryl Bem has agreed to make freely available for analysis, and detailed information in the emails about the timing and composition of the experiments. 

The data files include the date on which each experimental session took place, and this provides a much clearer picture of the sequence in which the experiments were done.

It also sheds light on the accusation sometimes made against Bem, that he combined together studies that were originally intended to be separate. This suggestion was based either on variations of the protocol within each experiment noted in his 2011 paper, or on an earlier publication in 2003, in which the studies are broken down into smaller units, called 101, 102, 201, 202, 203 and so one. 

My impression so far is that there is less force in this criticism than might have appeared previously. Admittedly there are problems with the earliest two studies, done in 2002, later known as Experiments 5 and 6. Experiment 5 does seem to have started out with several hypotheses in mind, and the first 50 sessions of Experiment 6 were originally associated with those of Experiment 5. 

But on the other hand, in several cases where the 2011 paper notes a change of protocol, the data files shows that the sessions were done in a continuous series without a break. That is the case for the remainder of Experiment 6 (participants 51-150), which is made up of three sections with different protocols, called 201, 202 and 203 in the 2003 publication. It's also the case for Experiment 2 (150 participants), where there was a change of protocol after the first 100 participants. The lack of a break in the series of sessions does seem consistent with the intention that they should form part of a single experiment.

In the other cases where there was a change of protocol, this did coincide with a break in the series of sessions. But those breaks were in the Summer - or in one case over Easter - and may have occurred because the participants, who were students, were away on vacation.

In considering the scope for large numbers of unreported sessions, it's also worth noting the time period covered by these data files. The recent Slate article claimed that Bem's studies spanned a decade, but that's not borne out by the dates in the files. The earliest two experiments were done in 2002, and the remainder between March 2005 and December 2009.

Chris

(2018-01-07, 07:29 PM)Chris Wrote: [ -> ](1) You mention 10 attempted replications as collectively unsuccessful, but the meta-analysis you cite - Bem et al. (2015) - analysed 80 attempted replications, and claimed that as a whole they were successful even when Bem's own studies were excluded. Perhaps there's a reason you don't accept this claim, but shouldn't it be discussed?

Daryl Bem has now commented himself, referring in more detail to the meta-analysis in which the results remained significant when Bem's original results were excluded. It also mentions something I had missed before - that in the "exact replications" using software supplied by Bem, the data were encrypted to prevent them being modified by the experimenters or their assistants:
https://replicationindex.wordpress.com/2...mment-3448

Chris

I don't know how interested people are, but the discussion is continuing. 

Linda has compared the statistics of Bem's experiment 101 (published in 2003) and of the first 50 participants in his experiment 5 (published in 2011), and has concluded that - as suspected previously - they are the same. (There is a factor of 10 difference in one statistic, but I assume that's a typo.) 

Because 101 involved 6 classes of images, but the description of 5 made it sound as though there was only one class of images plus controls, Linda says "I think this may be sufficient evidence to consider calling for a retraction." 
https://replicationindex.wordpress.com/2...mment-3486
Looks like the retraction letter was sent.

Chris

(2018-02-02, 05:29 AM)Iyace Wrote: [ -> ]Looks like the retraction letter was sent.

Has that been announced somewhere? I can't see anything on Prof. Schimmack's blog, but I don't find it the easiest to navigate.
Yep, it's in an update right up the top of the original blog post calling for retraction.

Chris

(2018-02-02, 09:40 AM)Laird Wrote: [ -> ]Yep, it's in an update right up the top of the original blog post calling for retraction.

Thanks. Prof. Schimmack previously circulated a draft of the letter for comment, including to some of those who had commented on his blog. I see that three more people have put their names to the letter - Dr Linda Schultz, Dr Rickard Carlsson and Dr Stefan Schmukle.

My own comments on the first draft of the letter were as follows:

I'm afraid I don't yet feel I understand what's going on with these experiments, and I think I will need quite a long and detailed look at the data before I stand a chance of doing that. 

I do think from what we know already that it would be reasonable for the author to correct the description of Experiment 5 that says there were no positive images, and also to clarify that the design of that experiment, and also that of Experiment 1, allowed for the testing of additional hypotheses that turned out not to produce significant results. I think those additional hypotheses should have beeen declared in the file drawer section, or preferably in the main results sections. (However, I do find it difficult to believe that Bem intended the description of Experiment 5 to deceive the reader, considering that he had just quoted the instructions to participants, which referred to a range of images including "very pleasant" ones. Obviously it's unfortunate that the referees didn't pick up on the contradiction there, and the obvious availability of alternative hypotheses in Experiment 1 as described in the paper.)

My other thought is that, regarding the statistical peculiarities that you number 1-3, and the decline effect you have found, they obviously show that the data don't reflect a well behaved phenomenon in which the effect size remains steady and individual trials are statistically independent.  The trouble is that where psi is concerned it's not clear whether that is enough to conclude that the data are spurious, because we don't know what the statistical properties of psi should be. We know how the statistics should behave in the absence of psi, so testing the null hypothesis is a rational way of proceeding, but doing the converse - assuming psi exists and testing whether the data are consistent with that assumption - doesn't necessarily make sense. I realise that's the kind of argument sceptics find infuriating, but if we allow for the possibility of things like psi experimenter effects, then psi really may not be statistically well behaved at all.

I think if it were possible to come up with a definite "recipe" by which the observed decline effect and the statistical properties 1-3 could be generated from chance data, then your case would become much stronger - though after looking at the dates in the data file I was left with the impression that there wasn't really time to fit in the kind of numbers of unreported participants that would be needed in the scenario you suggested. 

If the letter had been in its present form, I'd also have commented that - while Schimmack's scenario of discarding pilot experiments if unsuccessful and continuing them if successful would, in qualitative terms, produce a decline effect - I'm not convinced it would produce a decline effect which quantitatively matched the observed one.
Pages: 1 2 3 4 5 6