Dean Radin's 1989 Neural Networks paper

13 Replies, 1476 Views

[This thread was originally posted in the "Skeptic vs. Proponent" forum.]

This paper was referred to by Jeffrey Mishlove in his recent interview with Roger Nelson. In it, Dean Radin claims to have trained an artificial neural network using micro-PK data, so that the network was able to identify individual participants from their data more accurately than would be expected by chance:
http://deanradin.com/articles/1989%20neu...rk%201.pdf

This is what he did. The data were from PEAR Lab experiments in which participants tried to influence the behaviour of random number generators under three different conditions - high, low and neutral. The result of each trial was the sum of 200 randomly generated bits. 50 trials made a run, and series of 50 runs in each of the conditions made a full experimental session.

The first part of the paper uses one session for each of 32 participants. Each session was divided into two halves. One contained the first 25 runs of each series, and was used to train an artificial neural network to associate data with individual participants. The other half contained the last 25 runs of each series, and was used to test whether the neural network could really recognise the "signatures" left in the data by different participants. The simplest way of looking at this is to consider how many of the participants the network managed to identify correctly from the second half of the session. (Radin also presents results for some more complicated ways of looking at it.) There were 32 participants, and on the null hypothesis the whole process must be equivalent to picking each one at random, so the probability of getting each one right is 1/32. So on average we'd expect just 1 participant to be correctly identified by chance.

The other key feature of the experiment is that the training of the neural network is done by an iterative process starting from a randomised initial state. The neural network is highly non-linear, so if the initial state is changed, this process leads to a different final state. So Radin carried out the whole procedure of training and testing 100 times, starting with 100 different random initial conditions. The end result is one figure for the average number of participants identified correctly (i.e. the number averaged over these 100 repetitions of the process) and another figure for the standard deviation of the number of participants identified correctly.

He tried three different neural networks, and for the first one, the average number of correct identifications was 1.46 and the standard deviation was 1.13. The average is higher than 1, the number we'd expect by chance. But the question is whether that difference is statistically significant.

To try to answer that question, Radin generated a set of control data using a pseudo-random number generator, and applied exactly the same procedure to that. (He also did the same for a "scrambled" version of the experimental data, but the principle is similar.) The result was that the average number of correct identifications was 1.02 and the standard deviation was 0.91. Based on the averages and standard deviations for the PEAR data and the pseudo-random control data, he calculated a t statistic of 3.02 (for 198 degrees of freedom), which corresponds to a highly significant p value of 0.0014.

That seems to answer the question, but is that answer statistically valid?
[-] The following 1 user Likes Guest's post:
  • Sciborg_S_Patel
(2019-01-24, 05:49 PM)Chris Wrote: He tried three different neural networks, and for the first one, the average number of correct identifications was 1.46 and the standard deviation was 1.13. The average is higher than 1, the number we'd expect by chance. But the question is whether that difference is statistically significant.

To try to answer that question, Radin generated a set of control data using a pseudo-random number generator, and applied exactly the same procedure to that. (He also did the same for a "scrambled" version of the experimental data, but the principle is similar.) The result was that the average number of correct identifications was 1.02 and the standard deviation was 0.91. Based on the averages and standard deviations for the PEAR data and the pseudo-random control data, he calculated a t statistic of 3.02 (for 198 degrees of freedom), which corresponds to a highly significant p value of 0.0014.

That seems to answer the question, but is that answer statistically valid?

I had been hoping some other people would offer opinions, but perhaps this is too statistical for most people's tastes.

My opinion is that the statistical analysis in this paper is not valid, and that it produces the appearance of statistical significance where there isn't any.

The idea of the study is that each participant leaves a kind of "signature" in the data produced by the random number generators. So that for each individual there should be a similarity between the characteristics of the first of half of the session (used to train the neural network) and the second half of the session (used to test the neural network). The training and testing procedure generates a number that quantifies that similarity - the number of participants correctly identified from the second halves of their sessions.

But because the training procedure starts with a random configuration of the neural network, this number produced is itself random. For that reason, Radin repeated the training and testing procedure 100 times, to try to average out the contribution of the randomness, and to obtain a number that reflects the properties of the underlying data.

The t statistic is calculated using a measure of the variance of the result over the 100 repetitions of the training and testing procedure. So a large t value means that the average difference obtained between the PEAR data and the pseudo-random control data reflects a genuine difference in the underlying data, rather than random contributions from the neural network.

So the average value of 1.46 correctly matched participants in the PEAR data is genuinely different from 1.02 in the control data, not an artefact that the neural network is manufacturing. But that's not the same as saying that it's a statistically significant difference - that the value of 1.46 would be unlikely to arise by chance in randomly generated data. To determine that, we'd need to generate a large number of sets of control data - not just one set - and apply the same training and testing procedure to all of them. Then we'd need to look at the whole distribution of the number of correct matches of participants obtained, and to see where in that distribution lies the value of 1.46 obtained from the PEAR data. If it's an extreme value relative to the distribution, that means it's statistically significant. If it's somewhere in the middle, it isn't significant.

Radin didn't do that, so we can't be sure what the answer would be. But we do know that on the null hypothesis any procedure that simply tried to match participants based on the two halves of the data would be equivalent to picking them at random with uniform probability. So the number of correct matches would be binomially distributed with the number of trials being 32 and the probability of success in each trial being 1/32. It turns out that for statistical significance at the 0.05 level, that would require 4 or more correct matches. (The probability of 3 or more matches is 0.077; the probability of 4 or more is 0.017.)

So I think we can be pretty sure that the average number of correct matches produced by 100 repetitions of training and testing using the neural network - 1.46 - is not statistically significant.
[-] The following 2 users Like Guest's post:
  • Sciborg_S_Patel, malf
This post has been deleted.
(2019-01-24, 07:27 PM)Max_B Wrote: I did look at the paper twice (both times you posted it). But I couldn't understand the reasoning for condensing the Pear RNG data into 6 different statistical values, and why that might be meaningful. Neither could I understand why some significance was inferred when these haxadic values were then compared with some other RNG values Radin had produced as a 'control'. But as you say, the whole paper was statistical and way beyond my expertise. This is now I think the 5th Radin paper which I've looked at  - I think - which appears to be ION's promotional garbage.

I think he used six statistical measures of the RNG data just for convenience, because the neural network approach was feasible only with a small number of input parameters.

But I think he was expecting too much anyway, considering that the apparent effect size is so small in these microPK experiments.
(2019-01-24, 07:44 PM)Chris Wrote: But I think he was expecting too much anyway, considering that the apparent effect size is so small in these microPK experiments.

York Dobyns (in Broderick and Goertzel, "Evidence for Psi") quotes an overall effect size for PEAR's "Basic REG" experiments of about 0.18 per million bits. I'm not sure whether I'm counting in exactly the same way, but I think Radin's data have a million bits per participant in the high and low conditions. So with only 32 participants he couldn't really expect to achieve significance.

To make matters worse, according to Varvoglis and Bancel (in "Handbook for the 21st Century"), the PEAR benchmark data were produced by 91 participants, two of whom produced highly significant results (and a quarter of the data), but the other 89 of whom didn't even achieve significance collectively. On that basis the neural network could hardly be expected to distinguish any of the 89. The best that could reasonably be hoped for would be to identify correctly the two star performers (if those two contributed to the data used by Radin).
Anyone know if Radin has ever published a negative study?
[-] The following 1 user Likes malf's post:
  • Steve001
(2019-01-24, 10:55 PM)malf Wrote: Anyone know if Radin has ever published a negative study?

He's no square.
(2019-01-24, 05:50 PM)Chris Wrote: But we do know that on the null hypothesis any procedure that simply tried to match participants based on the two halves of the data would be equivalent to picking them at random with uniform probability. So the number of correct matches would be binomially distributed with the number of trials being 32 and the probability of success in each trial being 1/32.

Would it, though, Chris? Wouldn't a binomial distribution assume independence between trials, which isn't the case here (because whether or not you pick a correct match on trial #n affects the probability that you will (be able to) pick a correct match on trial #n+1)?
(2019-01-26, 06:00 AM)Laird Wrote: Would it, though, Chris? Wouldn't a binomial distribution assume independence between trials, which isn't the case here (because whether or not you pick a correct match on trial #n affects the probability that you will (be able to) pick a correct match on trial #n+1)?

Yes. Sorry, I should have specified a matching procedure that tries to find a match for each participant's second-half data independently  (like sampling with replacement). The number of matches under that condition is binomially distributed, and it is the appropriate condition to compare with Radin's result, because his neural network is applied independently to each participant's second-half data. So it can suggest the same participant for more than one set of data.
[-] The following 1 user Likes Guest's post:
  • Laird
(2019-01-26, 09:00 AM)Chris Wrote: Yes. Sorry, I should have specified a matching procedure that tries to find a match for each participant's second-half data independently  (like sampling with replacement). The number of matches under that condition is binomially distributed, and it is the appropriate condition to compare with Radin's result, because his neural network is applied independently to each participant's second-half data. So it can suggest the same participant for more than one set of data.

Ah, thanks. I hadn't read the paper and probably should have indicated that - I had simply assumed sampling without replacement. But anyhow, based on the formula supplied on this page, your conclusion wouldn't change at all even if the sampling had been without replacement; only the probabilities would have changed. That is, it would still have taken 4 or more correct matches to have achieved significance at the 0.05 level, it's just that the probabilities would have been, for 3 or more matches, 0.080 (i.e., for sampling without replacement) where the binomial probability you cited is 0.077 (for sampling with replacement), and, for 4 or more matches (without replacement), 0.019, where the binomial probability you cited (with replacement) is 0.017.

[*]I didn't check the derivation of the formula, but I did confirm that it gives the right results for n=4.

So, your criticism seems sound. That said, it seems unfortunate that parapsychologists like Dean cop it no matter what: if, as in the Ganzfeld, they compare with theoretical expectation, then they are criticised for not comparing with controls; if, though, as in studies like this, they compare with controls, then they are criticised for not comparing with theoretical expectation! Probably best would be to do both, but even then, can we be sure that they would not still be criticised...?...

  • View a Printable Version
Forum Jump:


Users browsing this thread: 1 Guest(s)