Dean Radin preprint on "Tricking the Trickster"

15 Replies, 2629 Views

Courtesy of the SPR Facebook page, here's a preprint by Dean Radin at PsyArXiv: 
https://psyarxiv.com/9thae/

It's a reanalysis of two large forced-choice online psi experiments, inspired by the ideas of a group of Christian Scientists known as Spindrift. When the experiments were analysed by conventional hit rate, the results were statistically significant separately, though not dramatically so, and became insignificant when the two experiments were combined.

But for the non-conventional analysis, in which the trials are divided into separate sequences according to the position of the subject's guess, and then the number of alternations between hits and misses in each sequence is calculated, the results are wildly significant (z = 10.6).

Obviously if the analysis is valid this is very important, but I think the effects of response bias and optional stopping (in combination) do need to be considered very carefully.
[-] The following 1 user Likes Guest's post:
  • laborde
(2018-08-20, 09:19 PM)Chris Wrote: Obviously if the analysis is valid this is very important, but I think the effects of response bias and optional stopping (in combination) do need to be considered very carefully.

At the moment I'm thinking it doesn't look hard to formulate a simple model of optional stopping that would produce an effect in the observed direction.
It seems to me that the results described in this preprint may indeed reflect a statistical artefact related to optional stopping, rather than a genuine effect.

If I understand correctly, what's been done is this. The sequence of results of the forced-choice psi experiment, which had five options, have first been split into separate sequences for different participants. Then each of these has been divided into five sequences according to which of the five options the participant guessed was the target. Then for each of these five sequences, the analysis proceeds by counting how many times a correct guess is followed by an incorrect one or vice versa. Then these five numbers are added together to get a total for each participant, and finally they are added to get a grand total for all the participants.

The preprint notes that there is clear evidence of optional stopping - that is, participants tended to terminate the session early when the number of correct guesses was below average. It's well known that optional stopping doesn't present a problem for analysing the overall hit rate provided all the trials are included, including trials from sessions that ended early. Then the experiment can just be viewed as a random number generator spitting out a long series of targets, with different participants taking it in turn to guess. On the no-psi hypothesis (and with adequate randomisation, of course) all the trials are independent and have a 20% chance of success, and the coming and going of the participants is irrelevant.

The same strategy could be applied to Dean Radin's "sequential" statistic, described above. The trials for all the participants would be formed into a single long sequence, and the total number of correct/incorrect or incorrect/correct transitions would be calculated. That would result in an unbiased statistic with the right expected value.

Unfortunately, that's not quite what Radin is doing, because (if I understand correctly) he's separating the trials of different participants first. So he omits from his statistic the transitions between the final guesses of one participant and the initial guesses of the next. If - as a result of optional stopping - the final guesses are less likely to be correct than would otherwise be expected, then that will affect the number of those omitted transitions. It turns out that it reduces the expected number of transitions.

That means that - while the statistic based on all the transitions is unbiased - Radin is omitting a group of transitions which tends to be below overall expectation. So what he is left with tends to be above expectation. That is the direction of the effect he has reported. So it may well be a result of optional stopping.

Radin does argue (on page 7 of the preprint) against its being an artefact of optional stopping. He points out that when average results for sessions containing different numbers of trials are considered, there is a positive correlation between the number of correct guesses and the number of correct/incorrect or incorrect/correct transitions. Then he says that while the overall percentage of correct guesses is slightly below expectation, the overall percentage of transitions is above expectation. But I don't see that this rules out the effect being the result of optional stopping.

Anyhow, the mechanism suggested above can be tested easily enough, by including the omitted transitions between different participants, and seeing whether that does eliminate the statistical significance of the result.
I did a small calculation of the amount of bias that could be produced by the optional stopping mechanism described above.

I worked in terms of the following quantities:
(1) P, the theoretical probability of a correct guess (equal to one over the number of positions available for guessing, N)
(2) Mc, the number of trials in a completed session, and Mi, the average number of trials in sessions that were terminated early,
(3) Pc, the fraction of the final guesses that were correct in sessions that were terminated early (final guesses means the last guess for each of the N available positions)
(4) Kc, the average number of positions for which there was at least one guess in completed sessons, and Ki, the same for sessions that were terminated early, and
(5) F, the fraction of sessions that were terminated early.

Then the theoretical probability of a pair of consecutive guesses being either hit/miss or miss/hit is

Q = 2 P (1-P)

And if I have done the calculation right, then optional stopping changes this to Q + B, where

B = (1 - 2 P) F Ki (P - Pc) / (Mc - Kc + F [Mi - Ki])

Or if we assume that only a small fraction of sessions are terminated early, this can be approximated by:

B = (1 - 2 P) F Ki (P - Pc) / (Mc - Kc)

Unfortunately we don't know the values of all these parameters. What we know is this:
(1) The number of positions is 5, so P = 0.2.
(2) The caption of figure 7 says Mc was most commonly 20. 
(3) From the same figure, the fraction of all guesses that were correct in sessions that were terminated early was around 0.15, so we can take P - Pc to be 0.05 as a rough estimate.
(4) Kc and Ki would be 5 if there were guesses for every position in every session. In practice there will be some sessions for which there are no guesses for some positions, particularly for sessions terminated early and particularly given response bias. But as a rough estimate we can take Kc and Ki to be 5.

Putting these figures into the equation we get

B = 0.01 F

where F is the fraction of sessions that were terminated early. 

I can't see a figure for F in the paper, but the overall result presented was that Q + B was equal to 0.3205, so as Q = 0.32, this means

B = 0.0005

What we can say is that the percentage of sessions terminated prematurely that would be required to produce this result is roughly 5%, which doesn't seem implausibly high for unsupervised online experiments. But obviously the suggested mechanism can be tested only by looking at the data.
(2018-08-23, 08:33 AM)Chris Wrote: What we can say is that the percentage of sessions terminated prematurely that would be required to produce this result is roughly 5%, which doesn't seem implausibly high for unsupervised online experiments.
I haven't attempted to check all the mathematical reasoning above. However on the subject of unsupervised online activity, I'm often surprised by statistics from youtube which show what percentage of each individual video has been watched. Attention seems to trail off rapidly. Now I'm well aware that I'm talking about a completely different subject here, but it seems even the most enthusiastic viewers tend to watch typically 30% to 50% of a video. Few make it all the way to the end, those seem to be the exceptions. I should note I'm mainly talking about music videos. The figures for other genres may differ markedly.

Quote:But obviously the suggested mechanism can be tested only by looking at the data.
Quite so.
[-] The following 1 user Likes Typoz's post:
  • Doug
(2018-08-21, 08:28 PM)Chris Wrote: The same strategy could be applied to Dean Radin's "sequential" statistic, described above. The trials for all the participants would be formed into a single long sequence, and the total number of correct/incorrect or incorrect/correct transitions would be calculated. That would result in an unbiased statistic with the right expected value.

I disagree with this approach. The whole purpose of Dean's study is to measure the frequency of naturally paired outcomes resulting in a transition (e.g. hit, miss or miss, hit). I'm sorry, but stringing together individual sessions, in order to artificially pair off the final trial outcomes of some with the initial outcomes of others, will only result in adding considerable noise to the study.
As I recall from the early years of the GotPsi experiment, optional stopping was far more common than Dean and the late Richard Shoup ever anticipated. I have a distinct memory of one of them stating his surprise at its high prevalence. Additionally, there was a second type of optional stopping that concerned them: stopping while ahead. Numerous test subjects would finish a run of, say, 25 trials, then start another run but not complete it, quitting after getting one or more quick hits. This "positive optional stopping" behavior was quite evident in the daily halls of fame, where the trial totals of individual subjects often strayed from multiples of 10 or 25 (two of the three original run-length options).

In order to try to lower the incidence of positive optional stopping, Dean and Richard came up with a choice for doing exactly 100 trials of the Card Test. Richard described it like so to us test subjects in this GotPsi Yahoo group message:

Quote:[...] In the Card Test, we have added another option for Trials per Run -- 100 trials, called a "standard experimental run".  If you choose this option, you MUST complete exactly 100 trials, no more, no less.  If this proves popular, we'll add it to some other tests as well. 

As you may guess, this is to encourage runs without any "optional stopping", where users sometimes click until they get a high score just by chance and then stop.

https://groups.yahoo.com/neo/groups/gotp...ages/14170  (Jan 10, 2006)

Regarding "negative optional stopping", it's difficult to get a handle on its prevalence without access to the database. This is because the halls of fame only list results from users who've done 20 or more trials in a given day.
(2018-08-24, 03:11 AM)D oug Wrote: I disagree with this approach. The whole purpose of Dean's study is to measure the frequency of naturally paired outcomes resulting in a transition (e.g. hit, miss or miss, hit). I'm sorry, but stringing together individual sessions, in order to artificially pair off the final trial outcomes of some with the initial outcomes of others, will only result in adding considerable noise to the study.

The problem is that the statistic Dean Radin has used is prone to bias due to optional stopping, so something needs to be done to correct that, otherwise we won't know whether there's a real effect there or not. Radin went to quite a bit of trouble in the preprint to test for possible artefacts. Using the statistic I've suggested is a straightforward way of testing for the effect of optional stopping, which would be capable of ruling it out as an explanation.  

On the assumption that the result of Radin's analysis is a real effect, it's true that using this statistic would add about an extra 25% of random data. As the overall Z value he found is 10.6, I can't believe that would wipe out a genuine effect. But anyway something needs to be done, because the analysis in the preprint only leaves us guessing.
(2018-08-24, 03:42 AM)Doug Wrote: Regarding "negative optional stopping", it's difficult to get a handle on its prevalence without access to the database. This is because the halls of fame only list results from users who've done 20 or more trials in a given day.

I think it's clear from figure 7 of the preprint that negative optional stopping was more common than positive optional stopping, because the average success rate for sessions of fewer than 20 trials was more like 15% than 20%.
(2018-08-23, 08:33 AM)Chris Wrote: And if I have done the calculation right, then optional stopping changes this to Q + B, where

B = (1 - 2 P) F Ki (P - Pc) / (Mc - Kc + F [Mi - Ki])

Actually, thinking a bit more about it, I didn't get that calculation right.

I had been lazily thinking that the fraction of correct "final guesses" for completed sessions would be equal to the theoretical value P. That would be true for the very last guess of a completed session, but the guesses are divided into separate sequences according to the position of the guess. So guesses before the very last one are also relevant, and for those the fraction of correct "final guesses" will be higher than the theoretical value (compensating for the lower fraction of correct guesses in the sessions that were terminated early).

So in that equation there should also be a contribution from the "final guesses" of completed sessions. That will reduce the bias, but not eliminate it. I'd expect the fraction of sessions terminated early needed to account for the observed result to be higher than the 5% I estimated. But I don't think there's any point trying to correct that estimate, because too many of the required data are unknown.

  • View a Printable Version
Forum Jump:


Users browsing this thread: 9 Guest(s)